top of page

YOLO vs GPT: Large Language Model Surprisingly Good at Detecting Drones in VR Experiments

  • Writer: ImAFUSA
    ImAFUSA
  • 3 days ago
  • 1 min read

Figure 1: Drone Detection in VR environment with YOLOv8.
Figure 1: Drone Detection in VR environment with YOLOv8.

Researchers at ImAFUSA partner ICCS-NTUA explored how well modern AI can detect drones in immersive virtual environments—where human attention and perception are studied under both rural and urban settings.


Using data from the ImAFUSA VR experiments conducted by colleagues at ISCTE, researchers from ICCS trained a custom drone detection model based on the YOLOv8 architecture.


YOLO (‘You Only Look Once’) architectures are widely used in real-time object detection tasks, making them well-suited for identifying drones in complex visual environments


Yet despite its sophistication, YOLO struggled with small, pixel-sized drones scattered in high-resolution frames. Even with high-end GPUs and advanced training techniques, including transfer learning, label refinement, adaptive resolution scaling, and real-time metric monitoring, the model tended to underestimate drone counts.


To compare, we evaluated GPT-4o's ability to estimate drone numbers from VR video frames. Surprisingly, the LLM outperformed YOLO significantly, showing better alignment with the ground truth (r = 0.91 vs 0.53).


The takeaway: While object detection models shine in many use cases, LLMs can offer a robust alternative for tasks like visual scene understanding—especially when training data is limited or objects are extremely small.


This raises exciting possibilities at the intersection of vision models and language models for future AI-driven perception systems.


ICCS's detailed study has been submitted for publication in Human Computer Interaction International (HCII) 2025.



Subscribe on LinkedIn to receive all the updates
on our activities

ImAFUSA_Color (1_0).png
EN Co-Funded by the EU_POS.png

This project is co-funded by the European Union under Grant Agreement No. 101114776 and supported by the SESAR 3 Joint Undertaking and its founding members.​
 
Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or SESAR 3 JU. Neither the European Union nor the granting authority can be held responsible for them.​

SESARJU_SupportedBy_Color.png

This project is supported by the SESAR 3 Joint Undertaking and its founding members.​

Follow Us On:
  • imafusa_sm_icons-02
  • imafusa_sm_icons-01
  • youtube
  • bluesky logo 2

Dissemination Email: imafusa@futureneeds.eu

© ImAFUSA All rights reserved

Designed by Future Needs

Privacy Policy | Terms of Use

bottom of page