I fabricated a lot of the technical details on this page. It’s difficult writing a technical blog post about a fictional system!

Please also know that the images in this example aren’t high quality. In a post like this, I’d spend time creating clean visuals or work with design to create them.

Ch-ch-changes: How FIO models adjust to change and self-correct

The atmosphere is as dynamic a system as you can find. Butterflies are flapping their wings all the time and causing typhoons in Japan (or something like that). The point is that the system is incredibly complex and always changing.

Forecasts for this ever changing system have to adapt. Many models use static inputs but FIO’s models are constantly taking in the most recent data, processing it, and adjusting forecasts based on the latest data. This post focuses on the self-correction pieces, but we’ll talk more about how our forecast models work in a separate post.

Ingesting data

FIO uses Kafka for real-time data ingestion and processing. At a high level, we use a standard configuration that looks like this.

Producers

Producers are where we get our real-time weather measurements from. Most of the data is public but we have a few sources that come from private partnerships as well.

The data we ingest comes in different forms and some of it is pushed directly to our systems. We also use a combination of API calls and database queries to pull data into the system at regular intervals.

Topics

Topics are how we organize data from producers. There are thousands of topics and they range from things like temperature records to moon phases to water vapor measurements generated from volcanic eruptions. All of these variables contribute to weather, both in the near and long term.

Consumers

Consumers are internal applications that use topic data to create forecasts. They also perform actions like persisting real-time data into internal databases. We do this because every real-time measurement eventually because historical data that we use to improve our models.

Self-correction

FIO models generate forecasts nearly every second and we store them so they can be compared against real data in the future. This validation is basically a two step process:

  1. In the present moment, forecasts are generated for several time intervals: seconds, days, and weeks into the future.

  2. Forecast validations are triggered when real-time data for the forecasted times are received.

A forecast through time

Let’s say at 5pm forecasts are generated. Five seconds later, new data comes in and is used to validate the forecast made for 5:00:05pm. At 5pm the next day, the forecast made 24 hours ago is validated against current data. And a week out, real-time data is used again to validate the forecast made seven days ago.

This process is constantly looping. The time intervals used in the example are a subset of all the time intervals FIO’s models forecast for, but you can get an idea of the scale of the process.

Making adjustments

After validations are made, a few things can happen. If the models are accurate, not much changes. We continue with the same forecast approach and configuration. If models are off, we have several knobs and levers we can use to make adjustments.

As an example, let’s say daytime temperatures are trending lower than forecasted for a particular location. Our models immediately start looking into current data to determine why this might be. For this example, let’s say the models determine that cloud cover is higher than expected, which means less sunlight is reaching the earth’s surface, which reduces temperatures. The models then use traditional physics equations to determine how much cooler it should be with the given cloud cover and this information is incorporated into the next forecast.

Until next time

We hope this brief look at how our models self-correct has been interesting. Let us know in the comments if there’re are other topics you’d like us to cover!