Sensor Dataset
Table of Contents
- Overview
- Sensor Suite
- Dataset Structure Format
- Annotations
- Pose
- LiDAR Sweeps
- Calibration
- Intrinsics
- Log Distribution Across Cities
- Privacy
- Sensor Dataset splits
- Sensor Dataset Taxonomy
Overview
The Argoverse 2 Sensor Dataset is the successor to the Argoverse 1 3D Tracking Dataset. AV2 is larger, with 1,000 scenes totalling 4.2 hours of driving data, up from 113 scenes in Argoverse 1.
The total dataset amounts to 1 TB of data in its extracted form. Each vehicle log is approximately 15 seconds in duration and 1 GB in size, including ~150 LiDAR sweeps on average, and ~300 images from each of the 9 cameras (~2700 images per log).
Sensor Suite
Lidar sweeps are collected at 10 Hz, along with 20 fps imagery from 7 ring cameras positioned to provide a fully panoramic field of view, and 20 fps imagery from 2 stereo cameras. In addition, camera intrinsics, extrinsics and 6-DOF ego-vehicle pose in a global coordinate system are provided. Lidar returns are captured by two 32-beam lidars, spinning at 10 Hz in the same direction, but separated in orientation by 180°. The cameras trigger in-sync with both lidars, leading to a 20 Hz frame-rate. The nine global shutter cameras are synchronized to the lidar to have their exposure centered on the lidar sweeping through their fields of view.
We aggregate all returns from the two stacked 32-beam sensors into a single sweep. These sensors each have different, overlapping fields-of-view. Both lidars have their own reference frame, and we refer to them as up_lidar
and down_lidar
, respectively. We have egomotion-compensated the LiDAR sensor data to the egovehicle reference nanosecond timestamp. All LiDAR returns are provided in the egovehicle reference frame, not the individual LiDAR reference frame.
Imagery is provided at (height x width) of 2048 x 1550
(portrait orientation) for the ring front-center camera, and at 1550 x 2048
(landscape orientation) for all other 8 cameras (including the stereo cameras). All camera imagery is provided in an undistorted format.
Dataset Structure Format
Tabular data (annotations, lidar sweeps, poses, calibration) are provided as Apache Feather Files with the file extension .feather
. We show examples below.
Annotations
Object annotations are provided as 3d cuboids. Their pose is provided in the egovehicle’s reference frame.
io_utils.read_feather("{AV2_ROOT}/01bb304d-7bd8-35f8-bbef-7086b688e35e/annotations.feather")
timestamp_ns track_uuid category length_m width_m height_m qw qx qy qz tx_m ty_m tz_m num_interior_pts
0 315968867659956000 022c398c... BOLLARD 0.363046 0.222484 0.746710 0.68 0.0 0.0 0.72 25.04 -2.55 0.01 10
1 315968867659956000 12361d61... BOLLARD 0.407004 0.206964 0.792624 0.68 0.0 0.0 0.72 34.13 -2.51 -0.05 5
2 315968867659956000 12cac1ed... BOLLARD 0.337859 0.227949 0.747096 0.70 0.0 0.0 0.71 21.99 -2.55 0.03 13
3 315968867659956000 173910b2... BOLLARD 0.326865 0.204709 0.809859 0.71 0.0 0.0 0.69 3.79 -2.53 0.05 16
4 315968867659956000 23716fb2... BOLLARD 0.336697 0.226178 0.820867 0.72 0.0 0.0 0.69 6.78 -2.52 0.04 19
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
18039 315968883159714000 c48fc856... STROLLER 0.581798 0.502284 0.991001 0.97 0.0 0.0 -0.22 -10.84 34.33 0.14 13
18040 315968883159714000 cf1c4301... TRUCK 9.500000 3.010952 3.573860 -0.51 0.0 0.0 0.85 -26.97 0.09 1.41 1130
18041 315968883159714000 a834bc72... TRUCK_CAB 9.359874 3.260000 4.949222 0.51 0.0 0.0 0.85 138.13 13.39 0.80 18
18042 315968883159714000 ff50196f... VEHICULAR_TRAILER 3.414590 2.658412 2.583414 0.84 0.0 0.0 0.52 -13.95 8.32 1.28 533
18043 315968883159714000 a748a5c4... WHEELED_DEVICE 1.078700 0.479100 1.215600 0.72 0.0 0.0 0.69 19.17 -6.07 0.28 7
Pose
6-DOF ego-vehicle pose in a global (city) coordinate system is provided (visualized in the figure below as a red line, with red circles indicated at a 1 Hz frequency):
We refer to this pose as city_SE3_egovehicle
throughout the codebase:
>>> io_utils.read_feather("{AV2_ROOT}/54bc6dbc-ebfb-3fba-b5b3-57f88b4b79ca/city_SE3_egovehicle.feather")
timestamp_ns qw qx qy qz tx_m ty_m tz_m
0 315968112437425433 -0.740565 -0.005635 -0.006869 -0.671926 747.405602 1275.325609 -24.255610
1 315968112442441182 -0.740385 -0.005626 -0.006911 -0.672124 747.411245 1275.385425 -24.255906
2 315968112449927216 -0.740167 -0.005545 -0.006873 -0.672365 747.419676 1275.474686 -24.256406
3 315968112449927217 -0.740167 -0.005545 -0.006873 -0.672365 747.419676 1275.474686 -24.256406
4 315968112457428271 -0.739890 -0.005492 -0.006953 -0.672669 747.428448 1275.567576 -24.258680
... ... ... ... ... ... ... ... ...
2692 315968128362451249 -0.694376 -0.001914 -0.006371 -0.719582 740.163738 1467.061503 -24.546971
2693 315968128372412943 -0.694326 -0.001983 -0.006233 -0.719631 740.160489 1467.147020 -24.545918
2694 315968128377482496 -0.694346 -0.001896 -0.006104 -0.719613 740.158684 1467.192399 -24.546316
2695 315968128387425439 -0.694307 -0.001763 -0.005998 -0.719652 740.155543 1467.286735 -24.549918
2696 315968128392441187 -0.694287 -0.001728 -0.005945 -0.719672 740.153742 1467.331549 -24.550363
[2697 rows x 8 columns]
LiDAR Sweeps
For example, we show below the format of an example sweep sensors/lidar/15970913559644000.feather
(the sweep has a reference timestamp of 15970913559644000 nanoseconds):
x y z intensity laser_number offset_ns
0 -1.291016 2.992188 -0.229370 24 31 3318000
1 -25.921875 25.171875 0.992188 5 14 3318000
2 -15.500000 18.937500 0.901855 34 16 3320303
3 -3.140625 4.593750 -0.163696 12 30 3320303
4 -4.445312 6.535156 -0.109802 14 29 3322607
... ... ... ... ... ... ...
98231 18.312500 -38.187500 3.279297 26 50 106985185
98232 23.109375 -34.437500 3.003906 20 49 106987490
98233 4.941406 -5.777344 -0.162720 12 32 106987490
98234 6.640625 -8.257812 -0.157593 6 33 106989794
98235 20.015625 -37.062500 2.550781 12 47 106989794
[98236 rows x 6 columns]
Calibration
An example calibration file is shown below, parameterizing vehicle_SE3_sensor
for each sensor (the sensor’s pose in the egovehicle coordinate system):
>>> io_utils.read_feather(f"{AV2_ROOT}/54bc6dbc-ebfb-3fba-b5b3-57f88b4b79ca/calibration/egovehicle_SE3_sensor.feather")
sensor_name qw qx qy qz tx_m ty_m tz_m
0 ring_front_center 0.502809 -0.499689 0.500147 -0.497340 1.631216 -0.000779 1.432780
1 ring_front_left 0.635526 -0.671957 0.275463 -0.262107 1.550015 0.197539 1.431329
2 ring_front_right 0.264354 -0.278344 0.671740 -0.633567 1.554057 -0.194171 1.430575
3 ring_rear_left 0.600598 -0.603227 -0.371096 0.371061 1.104117 0.124369 1.446070
4 ring_rear_right -0.368149 0.369885 0.603626 -0.602733 1.103432 -0.128317 1.428135
5 ring_side_left 0.684152 -0.724938 -0.058345 0.054735 1.310427 0.267904 1.433233
6 ring_side_right -0.053810 0.056105 0.727113 -0.682103 1.310236 -0.273345 1.435529
7 stereo_front_left 0.500421 -0.499934 0.501241 -0.498399 1.625085 0.248148 1.222831
8 stereo_front_right 0.500885 -0.503584 0.498793 -0.496713 1.633076 -0.250872 1.222173
9 up_lidar 0.999996 0.000000 0.000000 -0.002848 1.350180 0.000000 1.640420
10 down_lidar -0.000089 -0.994497 0.104767 0.000243 1.355162 0.000133 1.565252
Intrinsics
An example camera intrinsics file is shown below:
>>> io_utils.read_feather("{AV2_ROOT}/54bc6dbc-ebfb-3fba-b5b3-57f88b4b79ca/calibration/intrinsics.feather")
sensor_name fx_px fy_px cx_px ... k2 k3 height_px width_px
0 ring_front_center 1773.504272 1773.504272 775.826693 ... -0.212167 0.328694 2048 1550
1 ring_front_left 1682.010713 1682.010713 1025.068254 ... -0.136984 0.209330 1550 2048
2 ring_front_right 1684.834479 1684.834479 1024.373455 ... -0.133341 0.208709 1550 2048
3 ring_rear_left 1686.494558 1686.494558 1025.655905 ... -0.129761 0.202326 1550 2048
4 ring_rear_right 1683.375120 1683.375120 1024.381124 ... -0.129331 0.201599 1550 2048
5 ring_side_left 1684.902403 1684.902403 1027.822264 ... -0.124561 0.196519 1550 2048
6 ring_side_right 1682.936559 1682.936559 1024.948976 ... -0.109515 0.179383 1550 2048
7 stereo_front_left 1685.825885 1685.825885 1025.830335 ... -0.113065 0.182441 1550 2048
8 stereo_front_right 1683.137591 1683.137591 1024.612074 ... -0.127301 0.198538 1550 2048
A local map is provided per log, please refer to the Map README for additional details.
Log Distribution Across Cities
Vehicle logs from the AV2 Sensor Dataset are captured in 6 cities, according to the following distribution:
- Austin, Texas: 31 logs.
- Detroit, Michigan: 117 logs.
- Miami, Florida: 354 logs.
- Pittsburgh, Pennsylvania: 350 logs.
- Palo Alto, California: 22 logs.
- Washington, D.C.: 126 logs.
Privacy
All faces and license plates, whether inside vehicles or outside of the drivable area, are blurred extensively to preserve privacy.
Sensor Dataset splits
We randomly partitioned 1000 logs into the following splits:
- Train (700 logs)
- Validation (150 logs)
- Test (150 logs)
Sensor Dataset Taxonomy
The AV2 Sensor Dataset contains 10 Hz 3D cuboid annotations for objects within our 30 class taxonomy. Objects are annotated if they are within the “region of interest” (ROI) – within five meters of the mapped “driveable” area.
These 30 classes are defined as follows, appearing in order of frequency:
-
REGULAR_VEHICLE
: Any conventionally sized passenger vehicle used for the transportation of people and cargo. This includes Cars, vans, pickup trucks, SUVs, etc. -
PEDESTRIAN
: Person that is not driving or riding in/on a vehicle. They can be walking, standing, sitting, prone, etc. -
BOLLARD
: Bollards are short, sturdy posts installed in the roadway or sidewalk to control the flow of traffic. These may be temporary or permanent and are sometimes decorative. -
CONSTRUCTION_CONE
: Movable traffic cone that is used to alert drivers to a hazard. These will typically be orange and white striped and may or may not have a blinking light attached to the top. -
CONSTRUCTION_BARREL
: Construction Barrel is a movable traffic barrel that is used to alert drivers to a hazard. These will typically be orange and white striped and may or may not have a blinking light attached to the top. -
STOP_SIGN
: Red octagonal traffic sign displaying the word STOP used to notify drivers that they must come to a complete stop and make sure no other road users are coming before proceeding. -
BICYCLE
: Non-motorized vehicle that typically has two wheels and is propelled by human power pushing pedals in a circular motion. -
LARGE_VEHICLE
: Large motorized vehicles (four wheels or more) which do not fit into any more specific subclass. Examples include extended passenger vans, fire trucks, RVs, etc. -
WHEELED_DEVICE
: Objects involved in the transportation of a person and do not fit a more specific class. Examples range from skateboards, non-motorized scooters, segways, to golf-carts. -
BUS
: Standard city buses designed to carry a large number of people. -
BOX_TRUCK
: Chassis cab truck with an enclosed cube shaped cargo area. It should be noted that the cargo area is rigidly attached to the cab, and they do not articulate. -
SIGN
: Official road signs placed by the Department of Transportation (DOT signs) which are of interest to us. This includes yield signs, speed limit signs, directional control signs, construction signs, and other signs that provide required traffic control information. Note that Stop Sign is captured separately and informative signs such as street signs, parking signs, bus stop signs, etc. are not included in this class. -
TRUCK
: Vehicles that are clearly defined as a truck but does not fit into the subclasses of Box Truck or Truck Cab. Examples include common delivery vehicles (UPS, FedEx), mail trucks, garbage trucks, utility trucks, ambulances, dump trucks, etc. -
MOTORCYCLE
: Motorized vehicle with two wheels where the rider straddles the engine. These are capable of high speeds similar to a car. -
BICYCLIST
: Person actively riding a bicycle, non-pedaling passengers included. -
VEHICULAR_TRAILER
: Non-motorized, wheeled vehicle towed behind a motorized vehicle. -
TRUCK_CAB
: Heavy truck commonly known as “Semi cab”, “Tractor”, or “Lorry”. This refers to only the front of part of an articulated tractor trailer. -
MOTORCYCLIST
: Person actively riding a motorcycle or a moped, including passengers. -
DOG
: Any member of the canine family. -
SCHOOL_BUS
: Bus that primarily holds school children (typically yellow) and can control the flow of traffic via the use of an articulating stop sign and loading/unloading flasher lights. -
WHEELED_RIDER
: Person actively riding or being carried by a wheeled device. -
STROLLER
: Push-cart with wheels meant to hold a baby or toddler. -
ARTICULATED_BUS
: Articulated buses perform the same function as a standard city bus, but are able to bend (articulate) towards the center. These will also have a third set of wheels not present on a typical bus. -
MESSAGE_BOARD_TRAILER
: Trailer carrying a large, mounted, electronic sign to display messages. Often found around construction sites or large events. -
MOBILE_PEDESTRIAN_SIGN
: Movable sign designating an area where pedestrians may cross the road. -
WHEELCHAIR
: Chair fitted with wheels for use as a means of transport by a person who is unable to walk as a result of illness, injury, or disability. This includes both motorized and non-motorized wheelchairs as well as low-speed seated scooters not intended for use on the roadway. -
RAILED_VEHICLE
: Any vehicle that relies on rails to move. This applies to trains, trolleys, train engines, train freight cars, train tanker cars, subways, etc. -
OFFICIAL_SIGNALER
: Person with authority specifically responsible for stopping and directing vehicles through traffic. -
TRAFFIC_LIGHT_TRAILER
: Mounted, portable traffic light unit commonly used in construction zones or for other temporary detours. -
ANIMAL
: All recognized animals large enough to affect traffic, but that do not fit into the Cat, Dog, or Horse categories