Foreground Attention Loss and Attention-Guided Convolution for Remote Sensing Object Detection

Published in : IEEE Sensors Journal (Volume: 26, Issue: 1, January 2026)

Authors : Tian Hongxian, Li Yunyue, Liu Jueting, Wang Zehua, Wang Zilong, Xu Tingting, Xu Zishan, Yang Wei, chen wei

DOI : https://doi.org/10.1109/JSEN.2025.3624139

Summary Contributed by: Zilong Wang (Author)

Remote-sensing imagery from satellites and drones is increasingly used for various tasks such as urban planning, maritime monitoring, agriculture, and disaster response and management. Unlike typical photographs, these images capture extensive areas where objects such as ships, vehicles, aircraft, and buildings may appear to be tiny, densely packed, and at arbitrary orientations. This combination of tiny targets and a cluttered background can cause detectors to miss objects or trigger numerous false alarms.

Two-stage detectors first generate candidate regions and then refine them, which helps distinguish objects from the background. However, the proposal and region-processing steps add latency and memory costs. One-stage detectors are faster because they predict object categories and bounding boxes directly from dense feature representations, yet they often lack a clear mechanism for identifying the true foreground. As a result, they may waste significant effort scoring background regions and sampling features from uninformative pixels, especially in scenes with many small, rotated objects.

This work introduces a lightweight “discover→focus” strategy for one-stage rotated object detection. The first component is Foreground Attention Loss (FAL). During training, the researchers transformed oriented bounding-box annotations into a rotation-aware foreground density map. Each labelled object contributes a compact, rotated kernel, and when combined, they form a heatmap indicating the possible locations of the objects. The classification pathway of the detector, which predicts a foreground attention map that matches this density map across multiple feature scales, was supervised. This approach gave the model an explicit, spatially grounded understanding of foreground density, encouraging it to highlight true object regions, while suppressing cluttered backgrounds.

The second component is Attention-Guided Convolution (AGConv), which refines features by concentrating sampling on important regions. Instead of using an extra “offset prediction” branch to learn deformable sampling offsets, which increases parameter and memory usage, AGConv computes sampling offsets in closed form. This is achieved by combining two types of cues: (1) geometric information from the local box hypotheses of the detector (centre, size, and orientation), which aligns the sampling pattern with structure of rotated objects; and (2) semantic information from the learned foreground attention, which direct sampling points toward nearby foreground centroids when the current location resembles background. This semantic–geometric fusion helps the model focus computation where it matters while maintaining an efficiency profile close to that of a standard 3×3 convolution.

The proposed pipeline, under a consistent ResNet-50 + FPN setting, improves detection accuracy across several widely used remote-sensing benchmarks, including DOTA-v1.0, DOTA-v1.5, DIOR–R, and HRSC2016, while adding negligible parameter and latency overhead. Qualitative analyses show that FAL produces sharper, better-aligned attention over true objects. AGConv method leverages these improved cues to reduce missed detections and false positives in cluttered scenes.

The approach brings some of the benefits of “foreground selection” typically seen in two-stage systems into a practical one-stage detector. This makes it attractive for large-scale, resource-constrained remote sensing applications, improving efficiency while maintaining speed. The innovation also improves detection accuracy, especially for small objects, thereby making remote sensing more reliable for real-world applications like mapping and surveillance.

Back

IEEE Xplore Version