VlassisFo
VlassisFo

Reputation: 660

How to create point cloud from rgb & depth images?

For a university project I am currently working on, I have to create a point cloud by reading images from this dataset. These are basically video frames, and for each frame there is an rgb image along with a corresponding depth image.

I am familiar with the equation z = f*b/d, however I am unable to figure out how the data should be interpreted. Information about the camera that was used to take the video is not provided, and also the project states the following:

"Consider a horizontal/vertical field of view of the camera 48.6/62 degrees respectively"

I have little to no experience in computer vision, and I have never encountered 2 fields of view being used before. Assuming I use the depth from the image as is (for the z coordinate), how would I go about calculating the x and y coordinates of each point in the point cloud?

Here's an example of what the dataset looks like:

rgb image corresponding depth image

Upvotes: 1

Views: 1642

Answers (1)

Joe Lee
Joe Lee

Reputation: 516

Yes, it's unusual to specify multiple fields of view. Given a typical camera (squarish pixels, minimal distortion, view vector through the image center), usually only one field-of-view angle is given -- horizontal or vertical -- because the other can then be derived from the image aspect ratio.

Specifying a horizontal angle of 48.6 and a vertical angle of 62 is particularly surprising here, since the image is a landscape view, where I'd expect the horizontal angle to be greater than the vertical. I'm pretty sure it's a typo: When swapped, the ratio tan(62 * pi / 360) / tan(48.6 * pi / 360) is the 640 / 480 aspect ratio you'd expect, given the image dimensions and square pixels.

At any rate, a horizontal angle of t is basically saying that the horizontal extent of the image, from left edge to right edge, covers an arc of t radians of the visual field, so the pixel at the center of the right edge lies along a ray rotated t / 2 radians to the right from the central view ray. This "righthand" ray runs from the eye at the origin through the point (tan(t / 2), 0, -1) (assuming a right-handed space with positive x pointing right and positive y pointing up, looking down the negative z axis). To get the point in space at distance d from the eye, you can just normalize a vector along this ray and multiply by it by d. Assuming the samples are linearly distributed across a flat sensor, I'd expect that for a given pixel at (x, y) you could calculate its corresponding ray point with:

    p = (dx * tan(hfov / 2), dy * tan(vfov / 2), -1)

where dx is 2 * (x - width / 2) / width, dy is 2 * (y - height / 2) / height, and hfov and vfov are the field-of-view angles in radians.

Note that the documentation that accompanies your sample data links to a Matlab file that shows the recommended process for converting the depth images into a point cloud and distance field. In it, the fields of view are baked with the image dimensions to a constant factor of 570.3, which can be used to recover the field of view angles that the authors believed their recording device had:

    atan(320 / 570.3) * (360 / pi / 2) * 2 = 58.6

which is indeed pretty close to the 62 degrees you were given.

From the Matlab code, it looks like the value in the image is not distance from a given point to the eye, but instead distance along the view vector to a perpendicular plane containing the given point ("depth", or basically "z"), so the authors can just multiply it directly with the vector (dx * tan(hfov / 2), dy * tan(vfov / 2), -1) to get the point in space, skipping the normalization step mentioned earlier.

Upvotes: 2

Related Questions