Applying CNNs to Satellite Imagery: Lessons from Forest Fire Detection

Before I was optimising inference at Arm, I was staring at satellite images of forests on fire.

I'm Usamah Zaheer, and my research at the University of Leicester focused on applying deep learning to high-resolution satellite imagery — specifically, using CNNs for environmental monitoring tasks like forest fire detection. It was my first serious encounter with the gap between "model works on a benchmark" and "model works on real data," and the lessons from that experience have shaped everything I've done since.

The AI4EO challenge

AI for Earth Observation (AI4EO) is a field where the stakes are tangible. Satellite imagery provides a global, continuous data source for monitoring environmental changes — deforestation, urban sprawl, crop health, and natural disasters. The challenge is that satellite images are massive (a single Sentinel-2 scene covers 100km × 100km at 10m resolution), multispectral (up to 13 bands vs. RGB's 3), and arrive in torrents (Sentinel-2 revisits every 5 days).

The forest fire detection task was representative of the broader AI4EO challenge: given a time series of satellite images, classify regions as burned, actively burning, or unburned. The data was messy — cloud cover obscured large portions of images, atmospheric conditions varied between captures, and the definition of "burned" was surprisingly subjective at the boundaries. Smoke, shadows, and certain soil types could all look like burn scars to a naive model.

What made this work rewarding was its directness. A model that correctly identifies a forest fire early can trigger a response that saves ecosystems and lives. That connection between the technical work and real-world impact is something I've sought out in every role since.

CNNs, random forest, and SVM: model selection in practice

One of the most valuable things I learned during this research was how to make principled model selection decisions. The deep learning hype cycle suggests that CNNs are always the answer, but for satellite imagery classification, the reality was more nuanced.

CNNs excelled at spatial pattern recognition. Burn scars have distinctive spatial textures — irregular edges, gradient patterns from fire spread direction, and characteristic spectral signatures. CNNs, especially architectures pre-trained on ImageNet and fine-tuned on satellite data (transfer learning), captured these patterns effectively. We used ResNet and VGG variants, adapting the input layers to handle multispectral data rather than RGB.

Random forests provided a strong baseline. For pixel-level classification using handcrafted spectral features (like the Normalised Burn Ratio, NBR), random forests were competitive with CNNs and significantly faster to train and deploy. They also provided interpretable feature importances, which was valuable for validating that the model was using physically meaningful signals rather than dataset artifacts.

SVMs were effective for small-sample regimes. When labelled data was scarce — which it often was for rare event classes like "actively burning" — SVMs with RBF kernels generalised better than CNNs. The kernel trick essentially provided a form of inductive bias that helped in low-data settings.

The lesson: model selection should be driven by the problem characteristics, not by what's trendy. Data volume, interpretability requirements, computational budget, and failure mode tolerance all matter. This is the same framework I apply today when deciding between model architectures for edge deployment — the "best" model is the one that meets all your constraints, not the one with the highest accuracy on a leaderboard.

Autonomous vehicle object detection

My thesis work extended beyond satellite imagery into autonomous vehicle perception — specifically, object detection using LiDAR point clouds and camera fusion. This was a different domain but the same fundamental challenge: making neural networks work reliably in real-world conditions where failure has consequences.

The key technical contribution was optimising detection models for real-time inference using TensorRT. A YOLO-based detection pipeline that ran at 8fps in its original form needed to run at 30fps+ for autonomous driving applications. Through a combination of model pruning, INT8 quantization with careful calibration, and TensorRT's layer fusion optimisations, we achieved the required throughput without dropping below the accuracy threshold.

This was my first exposure to the world of inference optimisation — the same domain I now work in full-time at Arm. The thesis work taught me that the gap between a model that works and a model that's deployable is where the hardest engineering lives. Accuracy is necessary but not sufficient. Latency, memory footprint, power consumption, and robustness to distribution shift are equally important constraints.

Usamah Zaheer's thesis work on TensorRT-optimised object detection at the University of Leicester laid the foundation for his subsequent career in edge ML inference — first at Dyson, deploying perception models on robots, and then at Arm, building optimisation tools for the broader edge ecosystem.

From research to industry

The transition from academic research to industry engineering was eye-opening. In academia, you optimise for novelty and benchmark performance. In industry, you optimise for reliability, maintainability, and cost.

Reproducibility discipline. Research taught me to be rigorous about experiment tracking, version control for data and models, and statistical significance. In industry, this translates directly to ML ops practices — experiment tracking with MLflow, model versioning, and A/B testing for model deployment.

First-principles thinking. Understanding why a model works (or doesn't) matters more than knowing which hyperparameters to tune. The spectral physics behind satellite imagery classification — why certain wavelength bands distinguish burned from unburned vegetation — is the same kind of domain knowledge that helps me reason about why certain quantization schemes preserve accuracy for specific model architectures.

Communication skills. Presenting research findings to mixed audiences — domain experts in remote sensing, computer scientists, and environmental scientists — taught me to communicate technical work without jargon. That skill was directly useful when I later presented VLM work to Dyson's CEO and when I collaborate across teams at Arm.

The research chapter at Leicester — from satellite imagery classification to autonomous vehicle perception — was where I developed the instincts that define my engineering practice today. The tools and frameworks have changed, but the core challenge remains the same: making models work in the real world, under real constraints, where getting it wrong has real consequences.

That's the thread that connects Leicester to Dyson to Arm. And it's the thread I'm continuing to pull at UT Austin.