Synchronized first-person video, gaze tracking, and IMU telemetry // perception training · robotics · driver monitoring · wearable AI
RT-Fusion delivers the human perspective data that no vehicle, robot, or fixed camera can generate — synchronized first-person video, gaze direction, and 200Hz IMU telemetry. For autonomous driving, robotics, driver monitoring, and wearable AI. EU-native acquisition across 6 countries.
Operational Capacity: 4h+ continuous World-View (GoPro 5.3K) combined with Gaze-View (Ray-Ban Meta) — audio-synced, running in parallel.
BUILD FOR:
Resolution
5.3K
Telemetry
200Hz IMU
Scenario
Real-World Weather
Whether you build for autonomous vehicles, robotics, driver monitoring, or wearable AI — your model needs human perspective data that no vehicle, robot, or fixed camera can generate. RT-Fusion captures this signal at human eye level, across real European environments, on-demand.
Bounding boxes don't show intent. A pedestrian's gaze direction 400ms before a crossing decision is invisible to vehicle and robot sensors. RT-Fusion captures this signal at human eye level.
A robot can record its own telemetry, but it can't record what a human body does while performing the same task. First-person video with gaze and IMU from real human activity — the training data imitation learning needs.
IR cabin cameras approximate driver gaze. RT-Fusion captures visible-light gaze ground truth on real European roads — where the driver actually looked when the traffic event occurred.
GoPro-on-helmet footage has different perspectives, motion, and social context than smart glasses. RT-Fusion captures on the exact Ray-Ban Meta device wearable AI products ship on.
A pedestrian wears the rig in real traffic. The Ray-Ban Meta captures where they look before stepping off a kerb — gaze direction, head-pose, social signaling. The GoPro captures synchronized world-view context with 200Hz IMU and GPS. Together, this produces the intent signal that no vehicle-mounted or robot-mounted sensor can generate: what the human was attending to before they acted. Delivered as time-stamped MP4 + GPMF telemetry, ingestible via PyTorch DataLoader, ROS 2 bag, or OpenCV.
A person wears the rig while performing physical tasks — climbing stairs, opening doors, navigating terrain, carrying objects. The Ray-Ban Meta captures first-person POV with natural head movement and gaze direction. The GoPro captures synchronized wider context with 200Hz IMU telemetry. This produces exactly what imitation learning, behavior cloning, and VLA architectures need: large-scale egocentric human demonstration video showing not just what the person did, but where they looked while doing it. No robot fleet can generate this data — it requires a human performing the task.
The driver wears Ray-Ban Meta glasses while driving real routes. The glasses capture exactly where the driver looks — road ahead, mirrors, phone, passengers, blind spot checks, gaze shifts at intersections and roundabouts. The GoPro on the dashboard captures the road scene simultaneously. Together, this produces synchronized driver gaze ground truth paired with traffic context, on real European roads, at a fraction of instrumented vehicle rig cost. No IR cabin camera provides this level of visible-light gaze fidelity with scene context.
A person wears the Ray-Ban Meta doing everyday activities — navigating city streets, shopping, commuting, socializing. The glasses capture first-person POV from the exact consumer device that Meta and its ecosystem partners are building for. This is not GoPro-on-helmet footage — the perspective, motion patterns, social context, and field of view match the actual user experience of smart glasses. The synchronized GoPro adds wider context with 200Hz IMU and GPS telemetry for spatial grounding.
Chest-mounted GoPro (world-view) + head-worn Ray-Ban Meta (gaze-view), running in parallel with audio-synced timestamps.
HARDWARE: GOPRO HERO 13 (CUSTOM ACQUISITION RIG)
Captures the "World Model." High dynamic range handles the "Tunnel Exit" blinding light problem. Rolling shutter stress-tests VIO pipelines against vibration artifacts.
HARDWARE: RAY-BAN META GEN 2
Captures the 'Agent Model.' Solves the High-Density VRU problem by recording the eye-contact negotiation and intent signaling that LiDAR cannot see.
7 scenario folders, each containing synchronized paired sensor output. Raw sensor data — no stabilization, no grading. Each folder is relevant to multiple buyer use cases.
RT-Fusion delivers structured, time-synchronized assets. Every frame is mapped to IMU telemetry and operator head-pose, enabling direct ingestion into standard machine learning and robotics pipelines.
{
"timestamp_utc": "2026-02-11T09:14:22.045Z",
"frame_id": 4920,
"environment": {
"location": "NL_Amsterdam_Canal_District",
"weather": "overcast_diffuse",
"surface": "asphalt_bike_lane"
},
"telemetry": {
"imu_accel_x_y_z": [0.02, -0.81, 0.15],
"speed_mps": 5.8
},
"sensors": {
"world_cam_file": "GH010492.MP4",
"attention_cam_file": "RM010492.MP4",
"head_pose_proxy": true
}
}
All assets delivered as time-stamped MP4 + GPMF telemetry, directly ingestible via ROS 2 bag conversion or PyTorch DataLoader.
Optimized For Standard Engineering Pipelines
CREDENTIALS // METHODOLOGY
ARTY ZUEV
10+ years in professional media production — camera systems, color science, lighting, and post-production — across commercial, documentary, and marketing projects in the EU. When the industry shifted from language models to real-world perception, RT-Fusion identified a critical gap: companies building autonomous systems in Europe had no dedicated, on-demand source for the human perspective data that no vehicle, robot, or fixed camera can generate. RT-Fusion was built to close that gap — applying professional acquisition methodology to capture synchronized first-person video, gaze tracking, and IMU telemetry across autonomous driving, robotics, driver monitoring, and wearable AI use cases.
FROM BRIEF TO PIPELINE-READY DATASET
You specify target scenarios, locations, and environmental conditions. Campaign scoped per acquisition day.
Dual-sensor rig deploys to target location. GoPro 5.3K World-View + Ray-Ban Meta Gaze-View running in parallel. 4h+ continuous acquisition.
Time-stamped MP4 + GPMF telemetry, paired with JSON metadata per scene. All clips indexed by scenario category and sensor config.
Convert directly to ROS 2 bag via rosbag2, or load into a PyTorch DataLoader. GPMF telemetry parsed with gopro2gpx. Zero custom tooling required.
/// DIRECT ENGINEERING FEED
Direct line to Engineering. No sales agents.
Prefer async? rt@rt-fusion.com
— or submit a full brief below:
ENCRYPTION: PGP-4096 // CONNECTION: SECURE