Authors
Alice Dumay, Grégoire Petit | MS Computer Science, Institut Mines-Télécom x Georgia Tech
Austin Lipinski, Taylor Brooks | MS Analytics, Georgia Tech
CSE 6242 Data and Visual Analytics | Fall 2020
Background
Given the complexities of using unsupervised learning to define climate biomes,
no universally accepted model exists. Thus, our goal was to develop an interactive,
dynamic tool that will help progress the standard towards machine learning-based
climate classification. We scraped the
National Centers for Environmental Information
website for weather station data and applied dynamic time warping to remove seasonal variation.
Then we employed two different clustering algorithms
1, 2 to
cluster the data by the weather parameters - temperature, precipitation, and pressure - over many
combinations of tuning parameters. We then selected the top performing models
based on accuracy and stability.
Finally, empty areas without weather stations were filled by growing the clusters outward, and
temporal smoothing was applied to reduce noise. For more details, check out the
full report and
poster.
Usage Notes
Model Selection:
The Biome Accurate1 model matches closely with known classifications such as the
Köppen-Geiger classification,
while the Latitude Focus2 model emphasizes differences across latitudes and more clearly
shows subtle shifts over time.
Cluster Selection: This parameter controls the granularity of clustering per user preference.
Timeline: Use the timeline slider and play button to observe cluster movement over time.
Mouseover Stats: Mouseover the different regions to see the average weather statistics for that cluster.
1The Biome Accurate option uses the Expectation Maximization clustering algorithm
2The Latitude Focus option uses the k-Means clustering algorithm