Evaluation metrics for Structure from Motion

Each group should produce code for automatic evaluation of the estimated camera positions on the dinosaur dataset. This should be done using the following procedure:

  1. Extract the global position and orientation of cameras in the ground truth for the "my" sequence.
    This results in a set of K poses: (Rk,tk)
  2. Extract the global position and orientation of your estimated cameras on the same dataset.
    This results in a set of K poses: (Restk,testk)
  3. Align the two sets by estimating a similarity transformation that maps your camera positions testk to the ground truth positions tk,
    i.e. tk=sMtestk+m,
    where s is a scale change, M is the global rotation, and m is the global translation. (See e.g. the IREG compendium sec 15.2, or Horns article.)
  4. Map your estimated camera centers and orientations using (s,M,m).
    The camera positions are mapped as:
    tmappedk=sMtestk+m,
    and the orientations are mapped as:
    Rmappedk=RestkMT
  5. Compute the individual errors for the camera positions as:
    ek=||tk-tmappedk||
    and the individual angular errors for the camera orientations using the 3D angular error (the geodesic distance on SO(3)):
    ak=2sin-1(||Rmappedk-Rk||/sqrt(8))
    See e.g. the article by Hartley et al. for a derivation of this error measure.
  6. Summarize the errors for the entire set of poses using the mean-absolute errors (MAE) for each of the two errors above.

If any of the steps above is unclear to you, ask your project guide.

References

  1. B.K.P. Horn, Closed-form solution of absolute orientation using unit quaternions, JOSA, Vol 4, 1987.
  2. Klas Nordberg, Introduction to Representations and Estimation in Geometry (IREG).
  3. R. Hartley, J. Trumpf, Y. Dai, and H. Li. Rotation averaging. International Journal of Computer Vision, 103(3):267-305, July 2013