New AI model helps cities analyse traffic risks at scale

25 November 2025

by William Thorpe

Researchers at NYU Tandon School of Engineering have developed a multimodal AI system that analyses traffic-camera footage to detect collisions and near-misses, providing cities with a way to draw safety insights from existing video networks without extensive manual review.

The system, SeeUnsafe, has been recognised with New York City’s Vision Zero Research Award and is published in Accident Analysis and Prevention. It applies multimodal large language models (MLLMs) to long-form traffic footage, aiming to support more continuous analysis than traditional crash-based approaches.

Many cities now collect extensive traffic video but still rely on limited data sources because reviewing footage is labour-intensive. Speaking to Cities Today, Kaan Ozbay, Senior Author of the SeeUnsafe paper and Director of the C2SMART Transportation Research Center at NYU Tandon School of Engineering, said this has created a persistent gap.

Kaan Ozbay, Senior Author of the SeeUnsafe paper and Director of the C2SMART Transportation Research Center

“Cities have rapidly expanded their traffic-camera networks, but their ability to learn from all that video hasn’t kept up,” he said. “Manual video analysis is extremely time-consuming and expensive, and custom computer-vision pipelines require specialised engineering, site-specific tuning, and ongoing maintenance.”

Ozbay said this has limited the practical value of large video archives for day-to-day decision-making.

“This gap is why ‘more data leads to safer roads’ hasn’t become reality,” he added. “The advent of MLLMs brings about a paradigm shift, enabling the continuous video to be transformed into automated, scalable safety insights.”

The model combines object-level signals with contextual reasoning. Unlike traditional pipelines that separate tasks such as detection, tracking and speed estimation, it interprets traffic scenes as a whole.

“SeeUnsafe uses MLLMs to understand traffic scenes more like a human safety engineer, rather than as a collection of separate vision tasks,” Ozbay said. “This kind of scene-level understanding is essential for spotting rare but important events like collisions and near-misses that classical CCTV often struggles to capture.”

Testing on the Toyota Woven traffic-safety dataset showed accuracy levels above those of comparison models. However, Ozbay said results from controlled datasets do not automatically translate to the complexity of real-world urban environments.

“SeeUnsafe’s accuracy is already promising in controlled settings, but there is still a gap before it is ready for fully reliable citywide deployment in a place like New York,” he said. Factors such as lighting, weather, camera position and unusual behaviour can affect performance.

The system also produces natural-language summaries describing the conditions and movements associated with detected events. Ozbay said this format helps integrate the tool into existing public-sector workflows.

“This system lowers the barrier between users and analytical outcomes by translating huge volumes of complex, unstructured video into insights that non-technical stakeholders can immediately understand,” he said. “It fits existing workflows, reduces the learning curve, and helps agencies adopt AI-assisted safety analysis without having to change how they already review and justify decisions.”

Ozbay added that the same technology could be used in wider mobility-planning or operational contexts as cities strengthen data infrastructure.

“We see fixed traffic cameras as the ‘0 to 1’ step,” he explained. “Once that foundation is in place, the same core technology can naturally extend beyond fixed cameras.”

The long-term goal is to now support a clearer understanding of how risk develops across road networks.

“We’re already seeing parallel efforts in urban mobility and autonomous driving, and SeeUnsafe is positioned to plug into that ecosystem as the ‘traffic safety intelligence layer’ that connects real-world video, virtual models, and operational decisions,” he said.

Main image: Juliengrondin | Dreamstime.com