In our day-to-day operations, we often leverage Python scripts for the retrieval and processing of large amounts of complex data in addition to R scripts for its accurate analysis and visualization. Certain datasets such as the NASA Earth Exchange Global Daily Downscaled Projections are used by our team quite regularly, and we use several sophisticated scripts to process and extract information from it.
To further improve user experience when working with this dataset, I set out to create a Graphical User Interface (GUI) based around some of the scripts in question. The goal was to help team members access, process, and visualize the data without delving into the underlying code. This ease of use is crucial in scenarios where rapid analysis and visualization are imperative, allowing teams to focus on insights rather than the technicalities of data manipulation.
To make the GUI intuitive and user-friendly, I created a tab for Data Retrieval and a tab for Data Visualization. I also added a third tab that serves as the About section and outlines how to use the tool, along with relevant details about the source data. I chose Streamlit as the front-end framework for this application, and the aim was to have it run locally, accessible via a web browser.
The Data Retrieval tab allows the user to select any one of the CMIP6 climate models and variables from respective dropdowns, specify a geographical region by simply entering a country or continent name in a text box, and enter a target year around which the data will be downloaded. The underlying Python script first fetches the polygon shape of the location entered by the user from the OpenStreetMap dataset (via the OSMnx Python library), calculates a region of interest around that, then uses this information to spatially subset the daily climate data when fetching it from the open access AWS S3 bucket.
The source CMIP6 data are in daily timesteps, so the script performs temporal aggregation to monthly before exporting the dataset as NetCDF files. It performs the same processing across all four Shared Socioeconomic Pathways and does so in a parallel manner while leveraging the computation capabilities of the laptop it is running on.
The final goal of this tool is to visualize a monthly rolling average across a given number of years on a choropleth map. For this purpose, the data is fetched for a time range spanning 20 years around the target year (9 years before and 10 years after the target year). Once the data is downloaded, it can then be uploaded to the Data Visualization page where the monthly rolling average around the target year is calculated first and visualized on a map. The user can use a slider to easily see the patterns for each month as the maps are created dynamically when the slider is moved from month to month. The basemaps were accessible through the folium library in Python, and the basemap used was from the ArcGIS Rest Services.
While Streamlit provides an intuitive and polished interface, the main challenge was to have this application be packaged as a standalone windows executable that can be used without having to open or even download Python. I tried a few approaches, but the one that worked is listed in this tutorial. The source code, a full list of Python libraries and the detailed methodology to recreate this project can be found on my GitHub.
I am currently in the process of further refining this application and adding functionality along with the ability to visualize other datasets. Any input and feedback are highly appreciated.