Session: Amazon’s Exabyte-Scale Migration from Spark to Ray

This open source case study examines an exabyte-scale data management migration 4 years in the making. We’ll recap Amazon’s data warehousing journey from Oracle, to Apache Spark, and most recently to Ray in search of increasingly efficient, scalable, and reliable open distributed computing frameworks. We’ll review migration progress, architectural requirements for running thousands of petabyte-scale data processing jobs daily, how Ray on EC2 is improving cost efficiency by over 80% vs. Spark on EMR, and how Amazon is contributing their work back to open source to bring the same benefits to catalog formats like Apache Iceberg.

This session will be recorded

Session: Amazon’s Exabyte-Scale Migration from Spark to Ray

Presenters:

Patrick Ames

Join thousands in Raleigh next fall

New to the ATO conference?

Help make next year possible