🚀 Meet Zipher at Databricks Data + AI Summit, June 10-12! Schedule Now
🚀 Meet Zipher at Databricks Data + AI Summit, June 10-12! Schedule Now
 

How OpenWeb Reduced Databricks Spend by 58% with Zipher

About OpenWeb

OpenWeb is building a healthier online conversation ecosystem. Powering thousands of publishers and engaging hundreds of millions of monthly users, the company relies heavily on advanced data infrastructure to deliver real-time experiences at scale.

To support its data products and analytics capabilities, OpenWeb runs a significant portion of its data stack on Databricks and AWS. While powerful, this architecture came with rising infrastructure costs – and limited visibility into where that spend could be optimized.

The Challenge

OpenWeb’s community platform powers conversations for millions of users in real time, generating highly dynamic and unpredictable data workloads. Ingestion volumes fluctuate rapidly based on user activity, publisher events, and global news cycles – making efficient compute management extremely challenging.

By the end of 2024 OpenWeb was running tens of thousands of monthly jobs on Databricks, and using hundreds of thousands of vCPU hours on AWS. 

Managing compute resources manually was increasingly unsustainable: clusters often had to be provisioned for peak loads, leading to significant wasted spend during off-peak periods.

Despite best efforts to monitor usage, traditional cost dashboards provided only high-level summaries- without the granular, job-by-job visibility needed to optimize such a dynamic environment. Identifying inefficient clusters, and right-sizing opportunities required deep technical investigation and constant manual tuning. Despite these efforts, the constantly changing nature  of the use cases led to suboptimal results. 

OpenWeb needed a solution that could continuously adapt to changing data patterns, optimize resource usage at scale, and deliver tangible savings without compromising system reliability or performance.

Enter, Zipher

OpenWeb partnered with Zipher, a purpose-built optimization solution for Databricks environments.

How Zipher Works:

Connects to OpenWeb’s Databricks Account
Zipher is a zero-touch solution, and was connected with Apache Airflow and Terraform within minutes. 

Zero-Touch Optimization
For the first few days, Zipher collected low-level Spark metrics and detailed resource utilization metrics, passively profiling OpenWeb’s workloads. Then began a period of active optimization, where Zipher leveraged its custom profiles to optimize workloads. These optimizations included cluster configuration seconds before each run as well as a Spark-aware autoscaler at runtime.

Automatic Adjustments
Zipher automatically adjusted to load variations and continues to optimize OpenWeb’s Databricks workloads throughout their lifetime. 

“Once we integrated Zipher’s optimization tool, we saw a significant impact within days. The improved efficiency of our Spark workloads led to substantial cost savings and enhanced performance across our data platform.”
— Stav Hacohen, Data Infrastructure Lead, OpenWeb

The Results

After installing Zipher, OpenWeb saw immediate and sustained improvements in cost efficiency:

Key Outcomes:

  • 63% reduction in AWS compute costs
  • 55% reduction in Databricks spending
  • 5% decrease in spot-loss frequency, improving stability

Performance remained stable throughout the optimization process. Through the Zipher dashboard, OpenWeb had full visibility to all optimizations taken by Zipher and were able to define SLAs for specific jobs when required. Zipher’s phased implementation allowed the team to monitor reliability metrics and apply changes incrementally, ensuring no disruption to workflows or delivery timelines.

Ready to see how Zipher can reduce your Databricks and compute costs? Book a demo with a Databricks optimization expert here.

Skip to content