Unpacking the Costs of Databricks Serverless Compute: Insights from Zipher’s Analysis

February 11, 2025

At the Data+AI Summit 2024, Databricks announced the General Availability (GA) of Serverless Compute for notebooks, workflows, and Delta Live Tables. After months of public preview, Serverless Compute is now available on AWS and is expected to launch on Azure in February 2025.

The promise of Databricks Serverless is appealing: an efficient compute model that eliminates infrastructure management while enabling seamless data workloads. This hands-off approach ensures data teams can focus on driving insights without the need to tune Spark clusters manually. But with great simplicity comes the critical question: is it worth the cost?

The Million-Dollar Question: Is Serverless Compute Cost-Effective?

The cost of Serverless Compute has been a hot topic among users and experts alike. Reports from the data engineering community consistently highlight significant cost concerns:

Cost Explosions in Practice

A Reddit user, Reasonable_Tooth_501, shared that transitioning jobs to Serverless resulted in a 5x cost increase, aligning with findings in the experiments by others in industry, which revealed classic compute to be 5x cheaper on average.

Uncertainty in DBU calculation

Serverless pricing introduces uncertainty due to the lack of transparency in how the consumption of DBU is calculated. With Databricks managing resource allocation internally, users have little insight into how costs are determined, making the $0.37/DBU rate less informative, as also reported by other users in this reddit thread.

When calculating the number of DBUs per hour for AWS instances, the value is determined by the instance type. For example, an m6id.xlarge instance corresponds to 0.76 DBUs per hour. It’s also crucial to account for instance_time, which considers all workers active during a run. If autoscaling is enabled, this calculation effectively integrates the area under the “scaling” graph, reflecting the varying number of workers used over time.

In the case of Serverless Compute, these details remain unclear. Specifically:

1. 1. Instance_time: Is a fixed number of workers used, or does it rely on autoscaling?
  2. DBU/hour for the selected instance type: Which underlying worker types are selected during the run?

This lack of transparency makes it difficult to predict costs accurately, as users are left uncertain about how Serverless Compute scales resources and calculates DBU consumption.

Higher Costs for Long-Running Workloads

Maria Vechtova (Ahold Delhaize) and Rik Hagnes (Jumbo Supermarkten) noted on LinkedIn that Serverless Compute is not cost-optimized for workflows exceeding one hour (Post), with reports of costs being 2-3x higher than traditional setups (Post).

No Flexibility in Cost vs. Performance Trade-offs

Serverless assumes that one size fits all, while in fact SLAs vary between different jobs at different times. Unlike classic compute, where users can balance costs and runtime by customizing cluster configurations, Serverless offers no such control. As cptshrk108 mentioned on Reddit:

“In my experience, it scales up way too much for jobs, and I would tolerate longer compute times at a cheaper price.”

We’ve researched the community’s perspective and conducted a detailed empirical analysis to explore the cost implications of Databricks Serverless Compute. Here’s what we found.

Zipher’s Empirical Analysis of Serverless vs. Classic Compute

To objectively compare the cost and performance of Databricks Serverless Compute against Classic Compute, Zipher used the TPC-DS benchmark with scale factor 1,000, an industry-standard test for data workloads.

Experiment Setup

Serverless Compute: Default settings were used.

Classic Compute: Three popular worker types were testsed using spot and on-demand instances:

- m6id.xlarge
- c5d.xlarge
- r6id.xlarge

Job Configuration:

- Autoscaling enabled
- Min Workers: 4
- Max Workers: 12
- Databricks runtime: 14.3 LTS (Spark 3.5.0)
- Driver: m6id.large (on-demand)

To ensure consistency, all workloads were run three times a day (6 AM, 2 PM, 10 PM) over three weeks, resulting in 63 runs per worker type. Serverless jobs were run simultaneously to eliminate bias.

Results

Compute Type	On-Demand	Spot Instances
Serverless Compute	9.47$
Classic Compute (m6id.xlarge)	4.14$	2.45$
Classic Compute (c5d.xlarge)	3.08$	2.51$
Classic Compute (r6id.xlarge)	3.44$	2.07$

The analysis showed that Serverless Compute costs were 3x higher than the most cost-efficient on-demand worker type (c5d.xlarge) and 4.5x higher than the most cost-efficient spot worker type (r6id.xlarge).

Notably, the impact on workload duration was minimal, with serverless compute completing the fastest at an average of 38 minutes, compared to 43 minutes for On-Demand and 59 minutes for Spot Instances.

Key Observations

Serverless Simplicity Comes at a Price

While Serverless eliminates the need for manual cluster management, its lack of flexibility can lead to inflated costs, especially for jobs that don’t require rapid spin-up.

Classic Compute Offers Customization

Classic compute allows for optimizations such as configuring Spark settings, EBS volumes, and instance types, which can significantly reduce costs.

Zipher’s Optimization Advantage

Zipher’s ML-powered engine automates these decisions by profiling resource utilization and dynamically adjusting configurations. This ensures efficient scaling and resource allocation, resulting in 30-50% cost reduction compared to classic compute, all without any manual intervention.

Conclusion

Databricks Serverless Compute offers a compelling solution for organizations seeking simplicity and ease of use. However, its higher costs, lack of transparency, and absence of customization options make it less suitable for advanced use cases or cost-sensitive workloads.

For those looking to balance cost and performance, Classic Compute remains a robust option, especially when paired with Zipher’s autonomous optimization platform.

Stay tuned for our next blog post, where we’ll dive deeper into the other pros and cons of Serverless Compute and explore how Zipher helps data teams make informed, cost-effective decisions.