YOLOX-SAR: High-Precision Object Detection System Based on Visible and Infrared Sensors for SAR Remote Sensing
Object detection in Synthetic Aperture Radar (SAR) remote sensing technology is a widely applicable and prominent interdisciplinary pursuit across diverse fields such as military operations, agriculture, forestry management, and geological exploration. However, it presents multifaceted challenges due to the intricate interplay of radar imaging, signal processing, image analysis, and artificial intelligence.
Recent advances in deep Convolutional Neural Networks (CNNs) have encountered significant hurdles when applied to SAR images due to their varying object sizes, occlusion, and background complexities, making SAR image detection notably more challenging than RGB (Red, Green, Blue) imagery. Even established networks like YOLO (You Only Look Once) and SSD (Single Shot Multi-Box Detector) face difficulties owing to the intricate electromagnetic wave scattering inherent in SAR images.
This paper introduces YOLOX-SAR, a system tailored specifically for precise object detection in SAR images, building upon the YOLOX architecture to address current limitations in SAR object detection.
Leveraging optimized configurations such as CSPDarknet53 and FPN (Feature Pyramid Network), YOLOX-SAR integrates Meta-ACON and the SPP (Spatial Pyramid Pooling) module for efficient feature extraction. At the same time, CBAM (Convolutional Block Attention Model) generates attention maps.
Additionally, data augmentation techniques like MixUp and Mosaic support the model resilience across different SAR environments, contributing to improved accuracy and refining region identification.
The Meta-SPP module employs max-pooling operations to generate four feature maps of different sizes, which are then combined through Concat operation to improve feature extraction efficiency. On the other hand, the CBAM-FPN module produces three distinct feature maps tailored to objects of different sizes, which are crucial for robust detection performance, particularly in SAR image scenarios.
Integrating these modules in YOLOX-SAR refines features and focuses attention effectively, addressing the complexities of SAR image detection.
YOLOX-SAR's performance is evaluated using the NWPU VHR-10 dataset, which includes 715 RGB images and 85 sharpened color infrared images. These images are annotated with a total of 4300 objects across different categories, forming the basis for comparative analysis.
This evaluation of the detection performance involves using five indices on the dataset, employing a training approach that includes partial pre-training and a staged process of freezing and unfreezing to refine the model and improve efficiency.
Experiments manipulating activation functions and network structures revealed significant enhancements in Backbone detection efficacy with the YOLOX model. Systematic insertion of CBAM at different positions led to notable improvements across all models, with the most important enhancement resulting in an mAP of 87.31% and a 1.79% increase, alongside improvements in Precision and Recall metrics.
Furthermore, YOLOX-SAR achieved an impressive mAP of 89.56%, marking a substantial 4.04% improvement over its predecessor, underscoring its efficacy and advancement in SAR image object detection.
Additionally, discussions on recent advancements in activation functions and dynamic network architectures contribute to a deeper understanding of neural network optimization, further enhancing the field's knowledge base and paving the way for future innovations in object detection methodologies.