user3667089
user3667089

Reputation: 3278

How does SFM/MVS overcome the color difference between different camera?

I am familiar with two view stereo but fuzzy on how SFM (Structure from motion) and MVS (Multiview stereo) exactly works.

Let's say I have two stereo pairs of cameras, (A, B) and (C, D). I can calculate the depth map for camera A using two-view stereo with cameras A and B. Similarly, I can calculate the depth map for camera C using two-view stereo with cameras C and D. Based on calibration, I can turn depth map A into point clouds and color them with color values from camera A. Similarly, I can turn depth map C into point clouds and color them with color values from camera C. In a perfect world, when I overlay point clouds A and point clouds C, it should look perfect without any obvious color problems, but unfortunately, in the real world, there will be some color difference between what camera A and camera C captures for the same point in space. I tried various ways of color averaging for point clouds that are visible in both camera A and camera C but no matter what there will be an obvious color "seam" between point clouds that are only visible in camera A and point clouds that's visible in both camera A and camera C.

However, this kind of color problem doesn't seem to exist in SFM and MVS. As shown in the results of colmap, AliceVision and RealityCapture. I've read multiple tutorials on how SFM/MVS works but none of them specifically explained how it overcomes the color problem. Most of them focused on explaining how to generate depth, and for the case of SFM estimating the intrinsics and pose. Can someone explain to me what method does conventional SFM/MVS uses to solve the color difference? I would appreciate a link to a tutorial/paper that explains this as well.

Upvotes: 1

Views: 370

Answers (2)

Ioannis Tsampras
Ioannis Tsampras

Reputation: 46

there seems to be a misunderstanding.

SfM's function as an algorithm is primarily to calculate the camera poses, the resulting pointcloud is more of a visual illustration of the process rather than a realistic representation of the scene.

What actually happens is that, unlike the depthmap process that creates a rather homogenous (in terms of density) and dense pointcloud, SfM creates 3D points based on matching features between images. Those points do not directly match pixels from the images, unlike a depthmap pointcloud where points are the result of pixels or patches from the cameras of the stereo pair, but features.

Features do not exist in "normal color space", they are meant to be irradiation invariable so different lighting conditions between images do not interfere, this transform that is performed before features can be extracted is more complex than just averaging colors between images therefore the reconstruction of a point's color after the coarse pointcloud has been created will be different than the methods you described.

Upvotes: 0

cDc
cDc

Reputation: 327

This problem needs to be explained in two different scenarios.

  1. Inaccurate SfM: one source of error regarding mismatches in color information between different views is often small errors in the computed camera poses. This is especially true if the mismatch appears in consecutive views, as the illumination in the real world most probably didn't have time to change much in the interval taking the images. The pose errors affect not only the coloring of the point cloud, but most importantly the depth-map estimation, which in turn amplifies the error in computing the point color due to inaccuracy in the pixel depth (which ends projecting in a wrong place in the other image). Same effect if SfM is accurate, but the depth-map estimation algorithm does a poor job.
  2. Illumination changes: the light might differ between two views of the same scene for many reasons: light source position change, camera exposure changes, atmospheric/environmental changes, etc. There are several ways to deal with it depending on the stage, like SfM or MVS. For example in SfM it is a problem in feature matching, and a feature extractor to be robust to illumination changes most of the time uses a descriptor based on some form of gradients in color space, which reduces the effect. In MVS there are several stages that rely on matching colors between views, but the most important one is depth-map estimation (or any other form of dense matching). This is solved by using a cost metric robust to illumination changes; a popular example is Zero Normalized Cross Correlation (ZNCC), and improved version of NCC that addresses exactly this problem.

Going back to your problem to solve, assuming all the above worked fine for you, in order to obtain a nice looking color for your point cloud, there are two popular solution: 1) averaging the color from all the views of the point, or 2) select only the "best" view per point. Obviously the problem in 1 is that the resulted color will be blurred, and for 2 a way to select the view per point is very important to minimize the transition between different views (and there are many ways to do this, but obviously a global approach would be the best).

Upvotes: 0

Related Questions