Research

Downloads

 

 

 

 

 

 

Markerless Tracking Dataset

Overview
Unlike dense stereo, optical flow or multi-view stereo, template based tracking lacks benchmark datasets allowing a fair comparison between state-of-the-art algorithms. Until now, in order to evaluate objectively and quantitatively the performance and the robustness of template-based tracking algorithms, mainly synthetically generated image sequences were used. The evaluation is therefore often intrinsically based.

This website accompanies our ISMAR 2009 paper "A Dataset and Evaluation Methodology for Template-based Tracking Algorithms" (bib) in  which we describe the process we carried out to perform the acquisition of real scene image sequences with very precise and accurate ground truth poses using an industrial camera rigidly mounted on the end-effector of a high-precision robotic measurement arm. For the acquisition, we considered most of the critical parameters that influence the tracking results such as: the texture richness and the texture repeatability of the objects to be tracked, the camera motion and speed, and the changes of the object scale in the images and variations of the lighting conditions over time.
We designed an evaluation scheme for object detection and inter-frame tracking algorithms and used the image sequences to apply this scheme to several state-of-the-art algorithms. The image sequences are freely available for testing, submitting and evaluating new template-based tracking algorithms.

 

How to use it
Below you find the datasets we generated until now. Each dataset consists of a movie, an image of the tracking target, the intrinsics of the camera used and a file giving undistorted ground truth positions for every 250th frame. All movies consist of 1200 frames each, we offer the movies both as they were captured (i.e. distorted) and rectified (i.e. undistorted, using parameters from undist.txt). The distortion model we used is from section 2.3.2 of the book "Nahbereichsphotogrammetrie" by Prof. Luhmann, second edition, 2003. There are five movies per target focusing on "Angle", "Range", "Fast Far", "Fast Close" and "Illumination". The movies are encoded with the lossless FFV1 codec from the ffmpeg-project (ffmpeg.org), a DirectShow codec is available at http://ffdshow-tryout.sourceforge.net/. You can use e.g. Virtual Dub http://www.virtualdub.org/ to convert the sequences into still images if you need to.

The task now is to detect the target image in the frames of the movie. All reference targets are 640x480 images. For every 250th frame, we provide the coordinates of four corners that are placed at the pixels (+- 512; +-384), the origin of the tracking target is in its middle (see image on the right, the white frame represents the 640x480 px target, the reference points given for initialization lie on the diagonal). All images have their origin in the upper left corner.

We offer to evaluate the results you obtain with your tracking algorithm and send you the results. If you agree, we can additionally publish your results on the webpage. To evaluate your results against the ground truth we have for every frame, please send an email to research(at)metaio.com where you attach a tabulator-separated log file of your experiments (1 per sequence) formatted like this example. Please use the same order of the pixels as in the example, i.e. (oc1u,oc1v) is the current position of pixel (+512;+384) of the reference template, (oc2u;oc2v) corresponds to (-512;+384), (oc3u;oc3v) to (-512;-384) and (oc4u;oc4v) to (+512;-384).

We evaluate your log files and then send you the results (example results for SIFT see below on the right). As measure we use the RMS of the four pixels. A frame is considered successfully tracked if the RMS is below 10 px.

For the evaluation results of SIFT, SURF, FERNS and ESM please refer to our paper.

Support:
This work was partially supported by BMBF grant Avilus / 01 IM08001 P.

 

Contact info
For comments and suggestions, feel free to contact research(at)metaio.com