TSBB33 3D Reconstruction Project

3D reconstruction project (2024 version)

Half the course consists of a 3D reconstruction project. The project is conducted in groups of 5 or 4 students. Groups of 5 students may propose to form groups, but do not worry, everyone that wants to, will be assigned to a group when the project starts.

Group assignment 2024

Date	Milestone
April 25	Project start and group assignment
May 2	Design plan due
May 19 (a Sunday)	Report submission for peer review
May 21	Final seminar with project presentations

Required functionalities

Each group should implement a system with the following functionality:

Robust correspondence extraction using RANSAC and the epipolar constraint on a sequence of frames.
(i.e. like CE1, but for many image pairs, and you may want to try different correspondences, e.g. from OpenCV.)
Computation of an E-matrix (e.g. from F and a known K) and subsequent extraction of the twisted-pair solutions to R and t from E, and selection of the correct one, using the visible-points constraint.
(See LE6)
Robust Perspective-n-point (PnP) estimation for adding a new view and detecting outliers. Use an existing P3P solver and your own RANSAC loop.
(See LE6,LE5)
Bundle adjustment for N cameras (N≥2) using known correspondences, intrinsic camera parameters (K-matrix), and an initial guess for the set of camera poses.
(See LE7)
Improved efficiency for bundle adjustment. Options:
- Use a sparsity mask for the Jacobian in the Bundle adjustment step (See CE1 extra task).
- Use pytorch and autograd (See Micro Bundle Adjustment).
- Program in C++, and use Ceres Solver, which also has automatic differentiation and a Shur complement solver. (Discuss with your guide).
Evaluate the robustness to noise in the implemented system, by measuring the camera pose errors of your reconstruction compared to reference poses. Details can be found here.

Optional functionalities

In addition to the required functionalities, groups with four students should do at least one extra task from the list below. Groups of five students should to at least two extra tasks.

Densification using e.g. PatchMatch, with crosschecking and the fundamental matrix constraint (See CE2)
Visualisation of the 3D model. Pick one of the following:
- Use screened Poisson surface reconstruction to obtain a volumetric representation. Try e.g. the version in Meshlab, or Kazhdan's own implementation (recommended, as it has more options). Both require you to estimate colour and surface normals for the 3D-points, and save these in PLY format. Use visibility in cameras to determine normal signs. See LE4. Theoretical details on Screened PSR can be found in Kazhdan and Hoppe.
- Use space carving (as in Fitzgibbon, Cross and Zisserman) to generate a volumetric representation of your object.
- Try a radiance field method, e.g. 3D Gaussian Splatting, Plenoxels or some NeRF variant. (discuss with your supervisor).
Replace incremental SfM with Global SfM. Use global estimation of all rotation matrices as an initialisation of Bundle Adjustment (as in Martinec and Pajdla). Eventually Graph Attention Networks would be fun to try here, but it is probably too difficult this year (GIT repo).
Make use of the points on the turntable to make camera pose estimation more accurate. This is not trivial, as the turntable is a dominant plane. You may e.g. use the fact that a view change on a plane is a homography.
Add re-triangulation to your pipeline, and compare using optimal triangulation and LOST (e.g. from GTSAM) in terms of execution time and the final result quality.
An idea of your own (discuss with your supervisor).

Datasets

The turntable datasets produced for the course.
For debugging it is useful to work with a noise-free dataset, and for this purpose, we have "cleaned" the dinosaur data set from any noise (up to the numerical accuracy of Matlab) and produced a dataset that contains 2D image points, 3D points, and the camera matrices. These are found in the BAdino2.mat file in the dataset repo above (use e.g. scipy.io.loadmat to load the file).
The VGG datasets from Oxford University. This includes e.g., the dinosaur sequence.
Some datasets by Carl Olsson at LTH:
Carl Olsson's SfM datasets
The EPFL multi-view stereo datasets are also highly recommended:
Multi-view stereo datasets

Code

Please look at the utility code for CE1 and CE2. Also, look at the Extra exercises in the lab sheets, they are meant to help you get started with the projects.

You are absolutely NOT allowed to use complete SfM systems such as COLMAP, SBA, Bundler, Visual SFM, SSBA, OpenMVG, or GTSAM.

On the other hand, you are encouraged to use a non-linear least squares package, (or pytorch) and if you intend to make a more advanced representation of your 3D model, you may use a package for this. Some useful (and allowed) packages are listed below.

Micro Bundle Adjustment by Johan Edstedt.
Ceres Solver, a library for efficient non-linear optimization. python bindings here.
sparseLM, a sparse Levenberg-Marquardt optimisation package.
lmfit, a package for non-linear least-squares minimization and curve-fitting for python.
PMVS, package to compute 3D models from images and camera poses.
OpenCV, implements several local invariant features, and some geometry functions.
Lambda twist P3P by Mikael Persson and Klas Nordberg.
P3P by Laurent Kneip.
OpenGV, a collection of geometric solvers for calibrated epipolar geometry.
Open3D, a python and C++ library for visualizing 3D models and point clouds.

Deliverables

In order to pass, each project group should do the following:

Make a design plan, and get it approved by the guide.
The design plan should contain the following:
- A list of the tasks/functionalities that will be implemented
- A group member list, with responsibilities for each person (e.g. what task)
- A flow-chart of the system components
- A brief description of the components
Hand in a written report to the project guide.
The report should contain the following:
- A group member list, with responsibilities for each person
- A description of the problem that is solved
- How the problem is solved
- What the result is (i.e. performance relative to ground truth)
- Why the result is what it is
- References to used methods
Deliver an oral presentation at the seminar
Prepare an opposition on another group, and hand in an opposition report after the seminar.

We recommend that you use the CVPR LaTeX template when writing the documents.

References

Most of these are cached locally in the course repository (except IREG).

C. Barnes et al. PatchMatch: A randomized correspondence algorithm for structural image editing, ACM Transactions on Graphics (Proceedings of SIGGRAPH) 2009.
A. Fitzgibbon, G. Cross and A. Zisserman, Automatic 3D Model Construction for Turn-Table Sequences, in 3D Structure from Multiple Images of Large-Scale Environments, Editors Koch & Van Gool, Springer Verlag 1998, pages 155-170.
Y. Furukawa and C. Hernández, Multiview Stereo: A Tutorial. In Foundations and Trends in Computer Graphics and Vision Vol. 9, No. 1-2, 2013.
Michael Kazhdan and Hugues Hoppe, Screened Poisson Surface Reconstruction, ACM Transactions on Graphics 2013.
K. Madsen et al., Methods for Non-Linear Least Squares Problems, 2nd ed., DTU technical report 2004.
Daniel Martinec and Tomáš Pajdla, Robust Rotation and Translation Estimation in Multiview Reconstruction, CVPR 2007.
Klas Nordberg, Introduction to Representations and Estimation in Geometry (IREG).
Johannes Schönberger and Jan-Michael Frahm, Structure-from-Motion Revisited, CVPR 2016.
Triggs, McLauchlan, Hartley and Fitzgibbon, Bundle Adjustment - A Modern Synthesis, in Vision Algorithms - Theory and Practice, Springer Verlag, Lecture Notes in Computer Science No 1883, 2000, pages 298-372.
Daniel Scharstein and Richard Szeliski, A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms, IJCV 47(1-3), 2002
Brian Curless and Marc Levoy, A Volumetric Method for Building Complex Models from Range Images, SIGGRAPH'96, 1996
William E. Lorenzen, Harvey E. Cline, Marching cubes: A high resolution 3D surface construction algorithm, SIGGRAPH'87, 1987
R. Newcombe et al. KinectFusion: Real-time Dense Surface Mapping and Tracking. ISMAR'11, 2011
C. Loop and Z. Zhang, Computing Rectifying Homographies for Stereo Vision, ICCV 1999.