TECHNICAL DOCUMENTATION

How it detects wildfires

Satellite data and machine learning at the core of this system

Two things sit at the heart of this project: a satellite feed that tracks fire across the Indonesian archipelago in near real time, and a machine learning model that learns what normal looks like — so it can tell you when something is not.

This page explains how they work, why they were chosen, and what the system is doing each time it processes a day's worth of data.

Satellite view of wildfire hotspot detection

// satellite hotspot detection over Indonesian archipelago

The satellite data

This system pulls data from NASA FIRMS: the Fire Information for Resource Management System. FIRMS aggregates active fire detections from two separate satellite instruments, VIIRS and MODIS, and makes that data available within hours of capture. Both instruments are mounted on satellites that orbit the Earth continuously, scanning the same locations multiple times per day.

NASA FIRMS processes over 100,000 hotspot detections per year across Indonesia alone. The raw feed is public, free, and updated daily.

11,867

hotspot records ingested

VIIRS and MODIS combined

<1hr

detection latency

satellite capture to alert

21M+

hectares burned annually

Indonesian forests

VIIRS — the primary instrument

VIIRS stands for Visible Infrared Imaging Radiometer Suite. It flies aboard the Suomi NPP and NOAA-20 satellites and is currently the more sensitive of the two instruments used here. Its spatial resolution is 375 metres per pixel, which means it can detect smaller, more recent fires that MODIS would miss.

For a country like Indonesia with dense tropical forest and peatland that can smoulder for weeks before becoming visible, precision matters.

VIIRS detects thermal anomalies by measuring the difference between mid-infrared and thermal infrared wavelengths at the Earth's surface. When a pixel is anomalously hot relative to surrounding pixels, it gets flagged as a potential fire. The instrument does this across the full swath of its orbit, typically revisiting the same location at least twice per day.

MODIS — the historical layer

MODIS, the Moderate Resolution Imaging Spectroradiometer, has been operating since 1999, first aboard the Terra satellite and later Aqua. Its spatial resolution is coarser (1 kilometre per pixel), but it carries over two decades of consistent fire data. That history makes it invaluable for establishing baselines: what does fire activity in a given region typically look like at this time of year?

The system ingests both feeds daily into a PostgreSQL database with the PostGIS spatial extension. Each incoming record carries coordinates, a confidence score, fire radiative power, and a timestamp.

instrument

resolution

role

VIIRS

375m / pixel

primary detection, high precision

MODIS

1km / pixel

historical baseline, 25+ years data

Spatial indexing with H3

Raw hotspot coordinates — latitude and longitude pairs — are not especially useful on their own. Two fires 200 metres apart in the same peatland concession are part of the same event. A fire in Kalimantan and one in Sulawesi are not. Making those distinctions at scale, consistently, requires a spatial index.

This system uses H3, a hierarchical hexagonal grid system developed by Uber. The world is divided into hexagonal cells at multiple resolutions; each cell has a fixed area and a unique identifier. In this deployment, incoming hotspot coordinates are mapped to H3 cells at a resolution that makes each cell roughly 86 square kilometres.

Hexagons are better than squares for spatial analysis because every hexagon has six immediate neighbours at the same distance from its centre. That property makes neighbour-based comparisons uniform, which matters when the model checks whether surrounding cells are also anomalous.

Once coordinates are mapped to cells, the data is grouped by cell and date — what the system calls a cell-day. A single cell-day aggregates all hotspot detections in a given area on a given date into a set of features: total hotspot count, average fire radiative power, confidence-weighted counts, and rolling trends from prior days.

11,867 → 7,765

raw hotspots aggregated to cell-days (~35% reduction)

That reduction is not information loss — it is the spatial structure the model needs to reason coherently across geography.

Isolation Forest: the anomaly detection model

The model is an Isolation Forest — an unsupervised machine learning algorithm designed specifically to detect anomalies. Understanding why this was chosen requires understanding what kind of problem wildfire detection actually is.

The problem with thresholds

The naive approach to fire detection is to set a threshold: if more than X hotspots appear in a cell on a given day, raise an alert. The problem is that X varies enormously by location, season, and land cover type. A count that signals danger in a rainforest conservation area might be completely normal in a managed agricultural burn zone during the dry season.

Fixed thresholds either flood you with false positives or miss genuine threats. The alternative is to let the model learn what normal looks like — for each part of the country, for each time of year — and then flag what deviates from that baseline. That is what Isolation Forest does.

How Isolation Forest works

Isolation Forest is built on a surprisingly simple idea. If you repeatedly partition a dataset at random — picking a feature, picking a random value within that feature's range, splitting the data — anomalies get isolated faster than normal observations.

Normal data points cluster together. They share similar feature values with many neighbours, so it takes many random cuts to separate them. Anomalies sit far from the cluster — a small number of cuts isolates them quickly. The model measures how many cuts it takes and converts that into an anomaly score. Low isolation depth means high anomaly score.

The model requires no labelled training data — no historical record of "this was a dangerous fire" and "this was not." It learns from the structure of the data itself, which makes it practical to deploy in regions where labelled fire event data is sparse.

In this system, the model is trained on the full set of aggregated cell-day features. Once trained, it scores each incoming cell-day. Cell-days with scores below a defined contamination threshold get flagged for the alert ranking step.

752 of 7,765 cell-days flagged as anomalous

Contamination rate of 9.7% — aligns with model configuration and known regional fire prevalence.

Features the model uses

The cell-day features passed to Isolation Forest include:

model features

total_hotspot_count

Raw count of hotspot detections in the cell on this day

confidence_weighted_count

Hotspot count weighted by satellite confidence score

mean_fire_radiative_power

Average energy output of detected fires (MW)

max_fire_radiative_power

Peak fire energy — captures intensity extremes

rolling_7d_avg

Rolling average of hotspot count over prior 7 days

rolling_ratio

Today's count ÷ 7-day rolling avg — detects spikes above baseline

The rolling ratio is what allows the model to distinguish a peatland cell that consistently shows some fire activity (normal) from the same cell spiking dramatically above its baseline (anomalous). Without time-series context, the two would look identical in a snapshot.

From anomalies to ranked alerts

Flagging 752 anomalous cell-days is not the same as delivering 752 useful alerts. An alert system that surfaces everything surfaces nothing.

The final step combines two signals to produce a ranked daily list. The first is the raw anomaly score from Isolation Forest. The second is spatial coherence: are the neighbouring H3 cells also showing elevated activity?

An isolated anomaly in a single cell with no surrounding signal is more likely to be a data artefact. A cluster of anomalous cells across a contiguous area is more likely to be a real fire.

The coherence check is what separates a useful alert from noise. A fire spreading across peatland will light up multiple adjacent hexagons. A satellite calibration artefact will not.

alert pipeline

7,765

total cell-days processed

752

flagged anomalous by Isolation Forest

649

alerts generated after coherence validation

103 single-cell artefacts filtered out

Inference

How we decide something might be a fire

This system does not confirm wildfires. It identifies conditions that, taken together, are statistically inconsistent with normal behaviour for a given location on a given day. The alert is the output of that comparison.

01 — signal quality

Quality of the signal

Every hotspot record from VIIRS and MODIS carries two quality indicators: a confidence score between 0 and 100, and a fire radiative power value measured in megawatts. Confidence reflects how certain the instrument is that the reading is a genuine thermal anomaly rather than sensor noise or cloud interference. Radiative power reflects the intensity of heat being released at the surface.

High confidence combined with high radiative power is a strong indicator. Low confidence readings are included in the model but weighted down proportionally.

confidence score 0–100 fire radiative power (MW)

02 — baseline deviation

Today versus recent history

The most diagnostic feature in the model is not the raw hotspot count. It is the ratio of today's count to the rolling 7-day average for that cell. A cell that normally shows 2 hotspots per day and shows 18 today has a ratio of 9. A cell that normally shows 15 and shows 18 has a ratio of 1.2.

The absolute numbers are similar. The deviation from baseline is not. This ratio is what separates a genuinely unusual day from a cell that is simply active by nature.

rolling 7-day avg today ÷ baseline ratio

03 — spatial coherence

Why neighbours matter

A single anomalous cell is weak evidence. Before an alert is generated, the system checks whether surrounding H3 cells are also showing elevated anomaly scores. A wildfire spreads across area. A sensor artefact or isolated industrial heat source does not.

Spatial clustering across adjacent cells is the final corroborating signal. Flagged cell-days with no supporting signal in their neighbours are filtered out before ranking.

H3 neighbour scoring coherence filter

04 — alert output

What the alert tells you

The output is a ranked signal, not a confirmation. A high-ranked alert means multiple independent indicators — the satellite reading, the deviation from baseline, and the spatial pattern across neighbouring cells — are all pointing in the same direction.

The further down the ranked list, the thinner that supporting evidence becomes. Ground verification is still required. The system's job is to tell you where to look first.

ranked signal not a confirmation

Open source and adaptable

The full pipeline — ingestion, spatial aggregation, model training, alert generation, and the dashboard — is released under the MIT licence. The codebase is built for Indonesia but the architecture is not Indonesia-specific. The H3 grid works globally. NASA FIRMS covers the world. Isolation Forest has no geographic assumptions.

Adapting the system to another region means configuring the geographic bounds for data ingestion, retraining the model on local data, and adjusting the contamination parameter to match regional fire prevalence. The core logic does not change.

Built by Itsavirus

github.com/Itsavirus-com/anomalous-wildfire-hotspots-detection

view on GitHub