Please enable JavaScript in your browser.

fltech - 富士通研究所の技術ブログ

富士通研究所の研究員がさまざまなテーマで語る技術ブログ

1st place in the Optimized Road Damage Detection Challenge (ORDDC'2024, IEEE BigData Cup)

Hello, I am Fangjun Wang, from Fujitsu Research & Development Center Co., Ltd., Shanghai Laboratory. Recently, we participated in the Optimized Road Damage Detection Challenge (ORDDC'2024), and we won the first place on the phase one and phase two leaderboard. In this blog, I will introduce the challenge information and our solution.

Overview of the challenge

The Optimized Road Damage Detection Challenge (ORDDC'2024) [1] addresses the problem of automating the road damage detection (RDD) targeting the optimized inference speed and resource usage. So far, the RDD challenges have prioritized enhancing the performance of RDD algorithms/models, with the F1-score serving as the primary (and sole) metric. However, moving forward, it has become increasingly crucial to address resource optimization concerns, particularly regarding inference speed and memory usage, to enable real-time deployment of these models. Therefore, the current challenge shifts the primary criterion towards optimizing resource usage.

Fig.1 Example of Road Damage Dataset

Our solution

To address both the accuracy and speed requirements of the challenge, we propose our method which consists of three main stages, as illustrated in Fig.2. Each stage is designed to leverage the strengths of different models and learning techniques to improve overall performance.

Fig.2 Overview of our method, which consists of three stages.

Stage 1: Initial Training

In the first stage, we train three models independently on the given RDD2022 training dataset:

  • Co-DETR [2] RDDv1: A transformer-based object detection model.
  • RTMDet [3] RDDv1: A CNN-based real-time object detector.
  • YOLOv10 [4] RDDv1: A lightweight and fast object detection model.

Stage 2: Mutual Learning

In our method, we adopt a similar framework shown in Fig.3, which trains model using the pseudo label only, guided by classification loss and bounding box loss. Co-DETR and RTMDet perform both as teachers and students. The teacher model is trained on labeled data, while the student model learns from both labeled and unlabeled data. The key steps in this stage can be summarized as: 1) generate pseudo labels for the test dataset using the models trained in Stage 1; 2) train Co-DETR RDDv2 and RTMDet RDDv2 on both the labeled training data and the pseudolabeled test data; 3) use classification loss and bounding box loss to guide the learning process. This mutual learning process allows the models to benefit from each other’s strengths and improve their performance on both labeled and unlabeled data.

Fig.3 Semi-supervised learning framework.

Stage 3: Knowledge Distillation and Speed Optimization

In the final stage, we use Co-DETR and RTMDet as teacher models to distill knowledge into YOLOv10, which serves as the student model. By using this method, we fine-tuned YOLOv10 RDDv2 on the test dataset using pseudo labels generated by Co-DETR and RTMDet. On the one hand, the optimized model learns from both the transformer-based (Co-DETR) and CNN-based (RTMDet) teachers. By focusing on the test dataset during this stage, we ensure that YOLOv10 is well-optimized for the target domain. On the other hand, the lightweight model size of YOLOv10 makes it possible to speed up the inference time while preserving knowledge it learned from large models as much as possible.

Effect

Effect of stage 2

Fig.4 shows the mAP scores on the validation set for RTMDet and Co-DETR models before and after Stage 2 (mutual learning). As can be seen, after stage 2, both performances of RTMDet and Co-DETR on validation set are improved. The improvement in mAP scores demonstrates the effectiveness of our mutual learning approach in Stage 2.

Fig. 4 Comparison on validation set during stage 2

Effect of our method

Fig.5 presents the F1-scores for each model and previously proposed methods on different subsets of the test data, as well as the overall average. Notably, compared with RDDv1, F1-score of YOLOv10 RDDv2 is highly improved. We think this significant improvement can be attributed to two factors: Firstly, knowledge distillation makes it learns from both RTMDet and Co-DETR; Secondly, focused training on the unlabeled test set optimizes the performance for the target domain. Also, compared with several existing methods used in previous road damage detection challenges, our method achieves higher accuracy (0.86 vs. 0.75). The key improvements of our method are: Firstly, we leverage the strengths of three different architectures, including transformer-based and CNN-based, and also multi-scaled models. Secondly, by using semi-supervised learning method, we utilize unlabeled data more effectively through our mutual learning stage.

Fig.5 F1-score comparison

Effect of final result

Fig.6 shows the detection results on images from different countries predicted by our method. All the images are from unlabeled test set of RDD2022 dataset. Top row, from left to right: China motorbike, Czech, India. Bottom row, from left to right: Japan, Norway, the United States.

Fig. 6 Detection results on images from different countries predicted by our method

Acknowledgment

We would like to thank the organizers of ORDDC’2024 for providing the dataset and evaluation platform. We also acknowledge the support of Fujitsu Research & Development Center, Co., LTD.

References

[1] Optimized Road Damage Detection Challenge (ORDDC'2024). https://orddc2024.sekilab.global/.

[2] Z. Zong, G. Song, and Y. Liu, ”Detrs with collaborative hybrid assignments training,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 6748-6758.

[3] C. Lyu, W. Zhang, H. Huang, et al., ”Rtmdet: An empirical study of designing real-time object detectors,” arXiv preprint arXiv:2212.07784, 2022.

[4] A. Wang, H. Chen, L. Liu, et al., ”Yolov10: Real-time end-to-end object detection,” arXiv preprint arXiv:2405.14458, 2024.