Referring expression segmentation is a fundamental task in computer vision that integrates natural language understanding with precise visual localization of target regions. Considering aerial imagery (for example, modern drone surveys, historic aerial archives, and high-resolution satellite captures) introduces unique challenges because spatial resolution varies widely across datasets, color usage is inconsistent, targets often shrink to only a few pixels, and scenes contain extreme object densities with frequent occlusions. This work presents Aerial-D, a new large-scale referring expression segmentation dataset for aerial imagery comprising 37,288 image patches with 1,522,523 referring expressions covering 259,709 annotated targets across individual instances, coherent groups, and semantic categories spanning 21 distinct classes that range from vehicles and infrastructure to land-cover types.
The dataset is constructed through a fully automatic pipeline that combines systematic rule-based expression generation with Large Language Model enhancement, enriching both linguistic variety and the visual detail captured within each description. As an additional capability, the pipeline produces dedicated historic counterparts for every scene, supporting archival analyses such as monitoring urban change across decades. We adopt the RSRefSeg architecture featuring a SigLIP2 vision-language encoder and a SAM segmentation decoder and train models on Aerial-D together with prior aerial datasets, yielding unified instance and semantic segmentation from text for both modern and historic imagery. Results show that this combined training achieves competitive performance on contemporary benchmarks while maintaining strong accuracy under the monochrome, sepia, and grainy degradations that characterize archival aerial photography.
This dataset builds upon two foundational aerial imagery datasets:
If you use this dataset or code, please cite:
@article{marnoto2025aeriald,
title={The Aerial-D Dataset for Generalized Referring Expression Segmentation on Aerial Photos},
author={Marnoto, Luís Pedro Soares},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (J-STARS)},
year={2025},
note={Submitted}
}