Run Spark jobs faster and with a fraction of the spending.

XONAI accelerates Spark jobs in your existing cloud data platform or on-premises environment. Activate today without code changes or migrations.

Request Access

Speedup of existing cloud big data platforms when XONAI is activated.

Benchmark
Amazon EMR 6.7.0
Databricks 10.4 (AWS)
Dataproc
TPC-DS
Coming soon
Coming soon
Coming soon
Coming soon
Coming soon
Coming soon
Coming soon
Coming soon
TPC-H
1.7x
1.45x
1.5x
1.6x
1.6x
1.6x
2.0x
Coming soon
Hardware
ARM
Intel
AMD
ARM
Intel
AMD
Intel
ARM
Instance
c6gd.8xlarge
m5d.8xlarge
m5ad.8xlarge
c6gd.8xlarge
m5d.8xlarge
m5a.8xlarge
n2-standard-32
Coming soon
Benchmark
Amazon
EMR 6.7.0
Databricks
10.4 (AWS)
Google
Dataproc
TPC-DS
Coming soon
Coming soon
Coming soon
TPC-H
1.7x
1.45x
1.5x
1.6x
1.6x
1.6x
2.0x
Coming soon
Hardware
Intel
AMD
ARM
Intel
AMD
ARM
Intel
AMD
Instance
c6gd.8xlarge
m5d.8xlarge
m5ad.8xlarge
c6gd.8xlarge
m5d.8xlarge
m5ad.8xlarge
n2-standard-32
Coming soon

3TB TPC-DS and TPC-H benchmark summary on 5 workers on the respective hardware and instance. Source

Cache acceleration

Spark

XONAI:  

lz4  

zstd  

uncompressed  

Compressed

Uncompressed

Benchmarked with 300GB TPC-DS and 50GB TPC-H data sets (higher is better)

Our optimized columnar data processing format allows up to 3x better cache compression while delivering significant speedups in any compression scheme.

Cache storage size

Spark

XONAI:  

lz4  

zstd  

uncompressed  

Compressed

Uncompressed

Uncompressed offers top performance at the expense of more storage

Benchmarked with 300GB TPC-DS and 50GB TPC-H data sets (lower is better)

Query
Databricks Runtime
EMR Runtime
Unresolved
Physical Plan
Catalog
Logical Plan
Optimised
Logical Plan
Selected
Physical Plan
JVM
RDDs
Cost
Model
Physical
Plans
AQE
XONAI
Runtime
XONAI
Equivalent
Plan
Batch metadata
XONAI Engine
MLIR Compiler
Columnar
RDDs

The leading engine

XONAI on
EMR
Databricks
Dataproc
Spark API compatible
Intel, AMD and ARM
Query acceleration
up to 2.7x
up to 2.7x
up to 3.0x
Cache acceleration
up to 6x
up to 6x
up to 6x
Average memory reduction
up to 40%
up to 40%
up to 40%
Faster data scans

Coming soon!

A unified UI for your data infrastructure

A plug-and-play Grafana-based UI that connects with your cloud and enables visibility over cloud spending for all your Spark jobs, identifies inefficient spending and assesses cost and performance optimization opportunities with XONAI even before activating it. Our engine provides detailed job execution metrics not available on any other Spark runtime.

Gain detailed visibility over cost and performance metrics of EMR clusters.

Understand how XONAI reduces Spark job costs and improves resource utilization.

See detailed execution and performance metrics unlocked by our engine for Spark applications.

Gain detailed visibility over cost and performance metrics of EMR clusters.

Understand how XONAI reduces Spark job costs and improves resource utilization.

See detailed execution and performance metrics unlocked by our engine for Spark applications.

XONAI for Apache Spark

Frequently Asked Questions

Our solution integrates with the open-source Apache Spark 3 distribution and the following data platforms:

- Amazon EMR starting from 6.3.0

- Databricks up to 10.4 (preview)

- Dataproc 2.0.X and 2.1.X release line of versions (preview)

The solution is activated by a Spark 3 plugin which runs physical plans equivalent to the ones selected by Spark runtimes. In practice, the spark-submit command will point to a JAR provided by us via spark.plugins property.

Additionally, our engine requires moving a fraction of the spark.executor.memory to the spark.executor.memoryOverhead setting. This change is needed because our engine allocates off-heap memory to process data rather than JVM memory.

Existing solutions tackle cloud spending reduction by improving resource provisioning and/or tuning application parameters, and may have only a one-time benefit only for workloads not being optimally deployed.

Our solution accelerates Spark data processing speed far beyond the default Spark engine (Catalyst), and delivers hardware acceleration and reduced resource utilization regardless of how optimally deployed Spark workloads already are.

No. We intentionally designed our engine to be API-compatible with existing runtimes for Spark, including proprietary ones that may modify query plans to improve performance, such as the EMR runtime.

The more time queries spend on doing physical computations, the more benefit they are expected to get. These are typically queries with heavy aggregations, joins and sorting stages.

A drop-in solution that can be activated in your cloud environment with no code changes to reduce cloud costs and accelerate insight delivery.

Reduce Spark cloud costs today

Request Access