How it detects wildfires
Satellite data and machine learning at the core of this system
Two things sit at the heart of this project: a satellite feed that tracks fire across the Indonesian archipelago in near real time, and a machine learning model that learns what normal looks like — so it can tell you when something is not.
This page explains how they work, why they were chosen, and what the system is doing each time it processes a day's worth of data.
The satellite data
This system pulls data from NASA FIRMS: the Fire Information for Resource Management System. FIRMS aggregates active fire detections from two separate satellite instruments, VIIRS and MODIS, and makes that data available within hours of capture. Both instruments are mounted on satellites that orbit the Earth continuously, scanning the same locations multiple times per day.
VIIRS — the primary instrument
VIIRS stands for Visible Infrared Imaging Radiometer Suite. It flies aboard the Suomi NPP and NOAA-20 satellites and is currently the more sensitive of the two instruments used here. Its spatial resolution is 375 metres per pixel, which means it can detect smaller, more recent fires that MODIS would miss.
For a country like Indonesia with dense tropical forest and peatland that can smoulder for weeks before becoming visible, precision matters.
VIIRS detects thermal anomalies by measuring the difference between mid-infrared and thermal infrared wavelengths at the Earth's surface. When a pixel is anomalously hot relative to surrounding pixels, it gets flagged as a potential fire. The instrument does this across the full swath of its orbit, typically revisiting the same location at least twice per day.
MODIS — the historical layer
MODIS, the Moderate Resolution Imaging Spectroradiometer, has been operating since 1999, first aboard the Terra satellite and later Aqua. Its spatial resolution is coarser (1 kilometre per pixel), but it carries over two decades of consistent fire data. That history makes it invaluable for establishing baselines: what does fire activity in a given region typically look like at this time of year?
The system ingests both feeds daily into a PostgreSQL database with the PostGIS spatial extension. Each incoming record carries coordinates, a confidence score, fire radiative power, and a timestamp.
Spatial indexing with H3
Raw hotspot coordinates — latitude and longitude pairs — are not especially useful on their own. Two fires 200 metres apart in the same peatland concession are part of the same event. A fire in Kalimantan and one in Sulawesi are not. Making those distinctions at scale, consistently, requires a spatial index.
This system uses H3, a hierarchical hexagonal grid system developed by Uber. The world is divided into hexagonal cells at multiple resolutions; each cell has a fixed area and a unique identifier. In this deployment, incoming hotspot coordinates are mapped to H3 cells at a resolution that makes each cell roughly 86 square kilometres.
Once coordinates are mapped to cells, the data is grouped by cell and date — what the system calls a cell-day. A single cell-day aggregates all hotspot detections in a given area on a given date into a set of features: total hotspot count, average fire radiative power, confidence-weighted counts, and rolling trends from prior days.
Isolation Forest: the anomaly detection model
The model is an Isolation Forest — an unsupervised machine learning algorithm designed specifically to detect anomalies. Understanding why this was chosen requires understanding what kind of problem wildfire detection actually is.
The problem with thresholds
The naive approach to fire detection is to set a threshold: if more than X hotspots appear in a cell on a given day, raise an alert. The problem is that X varies enormously by location, season, and land cover type. A count that signals danger in a rainforest conservation area might be completely normal in a managed agricultural burn zone during the dry season.
Fixed thresholds either flood you with false positives or miss genuine threats. The alternative is to let the model learn what normal looks like — for each part of the country, for each time of year — and then flag what deviates from that baseline. That is what Isolation Forest does.
How Isolation Forest works
Isolation Forest is built on a surprisingly simple idea. If you repeatedly partition a dataset at random — picking a feature, picking a random value within that feature's range, splitting the data — anomalies get isolated faster than normal observations.
Normal data points cluster together. They share similar feature values with many neighbours, so it takes many random cuts to separate them. Anomalies sit far from the cluster — a small number of cuts isolates them quickly. The model measures how many cuts it takes and converts that into an anomaly score. Low isolation depth means high anomaly score.
In this system, the model is trained on the full set of aggregated cell-day features. Once trained, it scores each incoming cell-day. Cell-days with scores below a defined contamination threshold get flagged for the alert ranking step.
Features the model uses
The cell-day features passed to Isolation Forest include:
From anomalies to ranked alerts
Flagging 752 anomalous cell-days is not the same as delivering 752 useful alerts. An alert system that surfaces everything surfaces nothing.
The final step combines two signals to produce a ranked daily list. The first is the raw anomaly score from Isolation Forest. The second is spatial coherence: are the neighbouring H3 cells also showing elevated activity?
An isolated anomaly in a single cell with no surrounding signal is more likely to be a data artefact. A cluster of anomalous cells across a contiguous area is more likely to be a real fire.
Inference
This system does not confirm wildfires. It identifies conditions that, taken together, are statistically inconsistent with normal behaviour for a given location on a given day. The alert is the output of that comparison.
Every hotspot record from VIIRS and MODIS carries two quality indicators: a confidence score between 0 and 100, and a fire radiative power value measured in megawatts. Confidence reflects how certain the instrument is that the reading is a genuine thermal anomaly rather than sensor noise or cloud interference. Radiative power reflects the intensity of heat being released at the surface.
High confidence combined with high radiative power is a strong indicator. Low confidence readings are included in the model but weighted down proportionally.
The most diagnostic feature in the model is not the raw hotspot count. It is the ratio of today's count to the rolling 7-day average for that cell. A cell that normally shows 2 hotspots per day and shows 18 today has a ratio of 9. A cell that normally shows 15 and shows 18 has a ratio of 1.2.
The absolute numbers are similar. The deviation from baseline is not. This ratio is what separates a genuinely unusual day from a cell that is simply active by nature.
A single anomalous cell is weak evidence. Before an alert is generated, the system checks whether surrounding H3 cells are also showing elevated anomaly scores. A wildfire spreads across area. A sensor artefact or isolated industrial heat source does not.
Spatial clustering across adjacent cells is the final corroborating signal. Flagged cell-days with no supporting signal in their neighbours are filtered out before ranking.
The output is a ranked signal, not a confirmation. A high-ranked alert means multiple independent indicators — the satellite reading, the deviation from baseline, and the spatial pattern across neighbouring cells — are all pointing in the same direction.
The further down the ranked list, the thinner that supporting evidence becomes. Ground verification is still required. The system's job is to tell you where to look first.
Open source and adaptable
The full pipeline — ingestion, spatial aggregation, model training, alert generation, and the dashboard — is released under the MIT licence. The codebase is built for Indonesia but the architecture is not Indonesia-specific. The H3 grid works globally. NASA FIRMS covers the world. Isolation Forest has no geographic assumptions.
Adapting the system to another region means configuring the geographic bounds for data ingestion, retraining the model on local data, and adjusting the contamination parameter to match regional fire prevalence. The core logic does not change.