DESIGN OF AN AUTOMATIC CONTROL AND NAVIGATION SYSTEM FOR UAVs WITH COMPUTER-VISION-BASED CORRECTION
https://doi.org/10.33815/2313-4763.2025.2.31.077-089
Abstract
Abstract. The article is devoted to the design of an automatic control and navigation system for unmanned aerial vehicles (UAVs) with trajectory correction based on computer vision. The architecture of a machine-vision system for autonomous navigation under GPS-denied conditions is considered. It is based on visual recognition of ground landmarks and multimodal data fusion from cameras and inertial sensors. The visual navigation algorithm employing the YOLOv8 object detector for recognizing key terrain landmarks and matching them with a reference topographic base map using the least-squares method is presented in detail. Methods of real-time visual data processing for determining the UAV position and orientation relative to ground objects are described. The integration of the computer-vision system with a model predictive control loop is analyzed in order to provide accurate trajectory correction based on visual observations. Special attention is paid to algorithms for fusing visual and inertial measurements using extended and unscented Kalman filters which improve the robustness of the navigation system. The presented simulation and experimental results confirm that the proposed hybrid system, which combines YOLOv8-based ground landmark detection, optical character recognition (OCR) of address plates, and data fusion in a Kalman filter, provides stable UAV positioning in the absence of a GPS signal with an error of about 10–15 m, without drift accumulation. The integration of visual navigation with a model predictive control loop with event-based activation makes it possible to reduce the computational load on onboard resources without degrading control quality. The proposed technical solutions can serve as a basis for building robust UAV navigation and control systems intended for use in GNSS-constrained urban and tactical scenarios.
References
2. Chen, Y., Que, X., Zhang, J., Chen, T., Li, G., Jiachi. (2025). When Large Language Models Meet UAVs: How Far Are We? arXiv. Retrieved from: https://arxiv.org/html/2509.12795v1.
3. Mentus, I., Yasko, V., Saprykin, I. (2024). Methods of mine detection for humanitarian demining: survey. Ukrainian Journal of Remote Sensing.
https://doi.org/10.36023/ujrs.2024.11.3.271.
4. Weng, Z., Yu, Z. (2025). Cross-Modal Enhancement and Benchmark for UAV-based Open-Vocabulary Object Detection. arXiv. Retrieved from: https://arxiv.org/html/2509.06011v1.
5. Liu, Q., Shi, L., Sun, L., Li, J., Ding, M., Shu, F. (2020). Path planning for UAV-mounted mobile edge computing with deep reinforcement learning. IEEE Transactions on Vehicular Technology, 69(5).
6. Liu, S., Zhang, H., Qi, Y., Wang, P., Zhang, Y., Wu, Q. (2023). AerialVLN: Vision-and-language Navigation for UAVs. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
7. Liang, Q., et al. (2025). Next-Generation LLM for UAV (NeLV) system – a comprehensive demonstration and automation roadmap for integrating LLMs into multi-scale UAV operations. arXiv preprint.
8. Penava, P., Buettner, R. (2024). Advancements in Landmine Detection: Deep Learning-Based Analysis with Thermal Drones. ResearchGate, Publication 391974681.
9. Stankevich, S., Saprykin, I. (2024). Optical and Magnetometric Data Integration for Landmine Detection with UAV. WSEAS Transactions on Environment and Development. https://doi.org/10.37394/232015.2024.20.96.
10. Kim, B., Kang, J., Kim, D. H., Yun, J., Choi, S. H., Paek, I. (2018). Dual-sensor Landmine Detection System utilizing GPR and Metal Detector. In Proceedings of the 2018 International Symposium on Antennas and Propagation (ISAP).
11. Novikov, O., Il’in, M., Stiopochkina, I., Ovcharuk, M., Voitsekhivskyi, A. (2025). Application of LLM in UAV route planning tasks to prevent data exchange availability violations. https://doi.org/10.28925/2663-4023.2025.29.892.
12. Kumar, C., Giridhar, O. (2024). UAV Detection Multi-sensor Data Fusion. Journal of Research in Science and Engineering.
13. Zhang, J., Huang, J., Jin, S., Lu, S. (2024). Vision-Language Models for Vision Tasks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8), 5625–5644. https://doi.org/10.1109/TPAMI.2024.3369699.
14. Cai, H., Dong, J., Tan, J., Deng, J., Li, S., Gao, Z., Wang, H., Su, Z., Sumalee, A., Zhong, R. (2025). FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models. arXiv. Retrieved from: https://arxiv.org/html/2505.12835v1.
15. Zhan, Y., Xiong, Z., Yuan, Y. (2024). SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model. arXiv. Retrieved from: https://arxiv.org/html/2401.09712v1.
16. Liu, Y., Bai, J., Wang, G., Wu, X., Sun, F., Guo, Z., Geng, H. (2023). UAV Localization in Low-Altitude GNSS-Denied Environments Based on POI and Store Signage Text Matching in UAV Images. Drones. URL: https://www.mdpi.com/2504-446X/7/7/451.
