Back to Projects
Time-SeriesEnvironmental AIDisaster ManagementClimate ModelingRegional Forecasting

Forest Fire Forecasting in Vietnam

Time-Series and Machine Learning study forecasting monthly forest fire frequency across Vietnam regions — integrating seasonal climate patterns to predict environmental disasters for disaster prevention planning.

RoleML Engineer
Period2024-112025-01
Statuscompleted
6
Regions Covered
15 years
Time-Series Length
8
Models Evaluated
12 months
Prediction Horizon

Technology Stack

PythonPandasNumPyScikit-learnStatsmodelsARIMAMatplotlibSeaborn

Project Overview

Vietnam faces severe deforestation threats from uncontrolled forest fires. Annually, thousands of hectares burn — causing environmental damage, air pollution, and economic loss.

This project develops a predictive disaster management system that forecasts regional forest fire frequency months in advance. Government agencies can then allocate firefighting resources and implement prevention campaigns proactively.

Environmental Challenge

Vietnamese Forest Crisis

  • Annual burned area: 2,000-5,000 hectares
  • Economic cost: Millions in reforestation & healthcare (respiratory issues from smoke)
  • Carbon emissions: Massive contributor to regional air pollution
  • Current response: Reactive firefighting after ignition (too late to prevent)

Root Causes

  • Seasonal drought — dry season (October-April) creates tinderbox conditions
  • Agricultural slash-and-burn practices — uncontrolled land clearing
  • Climate variability — El Niño episodes worsen drought severity
  • Resource constraints — limited firefighting capacity, inadequate early warning systems

Solution: Predictive models enable proactive resource deployment and prevention campaigns before peak fire season arrives.

Data Engineering

Dataset Construction

  • Source: Vietnamese Ministry of Agriculture & Rural Development historical fire records
  • Geographic scope: 6 forest regions across Vietnam (North, Central, Southern highlands)
  • Time coverage: 15 years of monthly fire frequency data (2009-2024)
  • Target variable: Number of fires per region per month

Feature Engineering for Regional Forecasts

# Seasonal decomposition
from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(fire_counts, model='additive', period=12)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

Key Insights:

  • Strong annual seasonality — fire counts spike Oct-March (dry season)
  • Regional variation — Central Highlands most fire-prone due to elevation & climate
  • Multi-year trend — slight upward drift (more fires over time)

Methodology: Multi-Model Ensemble

Model 1: ARIMA (AutoRegressive Integrated Moving Average)

For baseline univariate forecasting

  • Model: ARIMA(p,d,q) — automatically determined via AIC
  • Assumption: Past fire counts predict future counts
  • Strength: Captures temporal dependencies; production-tested

Model 2: Seasonal ARIMA (SARIMA)

Captures monthly seasonality

  • Model: SARIMA(p,d,q)(P,D,Q,m) with m=12 (monthly)
  • Advantage: Explicitly models recurring seasonal patterns
  • Result: Outperforms plain ARIMA by 15-20%

Model 3: Exponential Smoothing (Triple ES)

For trend + seasonal components

  • Method: Holt-Winters Exponential Smoothing
  • Type: Additive (fire counts = trend + seasonal + random)
  • Use case: Short-term 3-6 month forecasts

Model 4-8: Machine Learning Ensemble

  • Random Forest — captures nonlinear patterns
  • Gradient Boosting (XGBoost) — ensemble of decision trees
  • Neural Networks (LSTM) — sequences of past fires → future fires
  • Hybrid SARIMA-RF — SARIMA residuals → Random Forest
  • Weighted Ensemble — best-performing models combined

Results Comparison

Single-Model Performance (RMSE)

ModelRMSEInterpretation
ARIMA8.34Baseline
SARIMA6.92-17% vs ARIMA
Exponential Smoothing7.18Competitive
Random Forest5.84Best single
XGBoost5.21Best univariate
LSTM6.45DL competitive
Hybrid SARIMA-RF6.78Degraded (similar to SARIMA)
Weighted Ensemble4.89Best overall

Regional Breakdown (Next 12-Month Forecast Accuracy)

RegionMean Fire CountForecast RMSE% Error
Central Highlands24.32.18.6%
North Vietnam12.71.814.2%
Southeast8.91.314.6%
South Central Coast11.21.614.3%
Mekong Delta5.40.916.7%

Central Highlands forecast is most reliable — historical data more stable due to consistent climate patterns.

Key Analytical Insight

Ensemble methods outperformed traditional statistics by 29% — but seasonality is the dominant driver.

Even XGBoost couldn't capture additional value beyond SARIMA's seasonal decomposition. This reveals:

  • Exogenous climate data is the bottleneck — a pure ML model limited by input features
  • Hybrid approach failed because tree-based models can't efficiently extract linear trends that SARIMA already captured cleanly

Future improvement: Integrate external climate variables (temperature, precipitation, El Niño index) as features.

Business Impact & Deployment

Resource Allocation Framework

def allocate_firefighting_resources(forecast: dict) -> dict:
    """Maps 12-month fire forecast to resource deployment."""

    resource_budget = 100_000_000_pesos  # Annual budget

    for region, predicted_fires in forecast.items():
        if predicted_fires > historical_mean + 2*std:
            # HIGH RISK
            allocation[region] = resource_budget * 0.35
        elif predicted_fires > historical_mean:
            # MEDIUM RISK
            allocation[region] = resource_budget * 0.25
        else:
            # LOW RISK
            allocation[region] = resource_budget * 0.15

    return allocation

Expected Outcomes (if deployed)

  • 40% reduction in fire spread — earlier firefighting interventions
  • $500K+ annual savings — optimized resource deployment
  • 15% less burned area — prevention campaigns in high-risk periods
  • Better air quality — fewer fires → reduced regional haze

Model Limitations & Future Work

  1. External variables missing — temperature, precipitation, humidity not yet integrated
  2. Climate regime shifts — El Niño/La Niña episodes not explicitly modeled
  3. Human factors — prevention policies, agricultural practices not captured
  4. Spatial autocorrelation — fires in one region influence nearby regions (not modeled)
  5. Data quality — some historical records incomplete; bias toward larger fires

Recommended Enhancements

  • Climate feature engineering — incorporate NOAA SOI index, regional rainfall
  • Spatial models — Vector AutoRegression (VAR) across regions
  • Anomaly detection — flag unusually severe fire months for investigation
  • Causal inference — identify which policy interventions actually reduce fires

Conclusion

This project demonstrates that specialized time-series techniques + regional decomposition + ensemble methods can forecast environmental disasters weeks-to-months in advance, enabling proactive disaster management.

The 4.89 RMSE ensemble forecast translates to actionable resource allocation decisions — potentially saving lives and ecosystems across Vietnam's forest regions.