Global Water Security Center

Providing decision makers with the most reliable, ground-breaking research, applied scientific techniques, and best practices so that the hydrologic cycle and its potential impacts can be put in a context for appropriate action and response by the United States

Applying Lessons from the Databricks Data + AI Summit to GWSC’s Work

This opinion article was written by GWSC Environmental Data Scientist Dr. Sambadi Majumder.

As a data practitioner, I eagerly anticipate conferences and networking events that align with my field and those on its periphery. Being a Data Scientist with a keen interest in Data Engineering, scaled software design, and spatiotemporal data handling and analysis, I was particularly excited to attend the Data + AI Summit 2024 hosted by Databricks in San Francisco.

This being my first big tech conference, I was eager to learn as much as I could about industry-level applications of Big Data Management. The keynote speeches (here and here) highlighted crucial industry trends, with speakers emphasizing the growing demand for data and AI. I thoroughly enjoyed the talk “Your Guide to Data Engineering on the Data Intelligence Platform” by Matt Jones from Databricks and Raju Mudunuri from Lexmark, which showcased the necessity of a unified data platform for reliable and trustworthy data.

Javier de la Torre from CARTO and James Bentley from Landmark Information Group demonstrated how CARTO’s tools integrate seamlessly with Databricks to optimize geospatial data pipelines. They highlighted the use of H3 indexing for efficient calculations and data management, significantly reducing computation times. The session showcased how these innovations are transforming geospatial analytics, making them more accessible and scalable.

I interacted with experts from CARTO and industry professionals who use CARTO to explore how its geospatial capabilities could benefit GWSC. These conversations were invaluable, providing insights into the sophisticated tech stack required for geospatial applications. I thoroughly enjoyed discussing the finer details and brainstorming how we could integrate these capabilities with our existing Databricks-centric infrastructure.

The summit provided several practical insights that I can apply in my work. I learned about the use of Databricks asset bundles and how they can be efficiently used within a CI/CD scheme. This will streamline our deployment processes, ensuring consistent and reliable updates. The emphasis on a unified data platform and the Medallion architecture has reinforced the importance of efficient data organization and management. These actionable insights will help improve our data strategies and operational efficiency.