Andre
Andre

Reputation: 1883

Pixel coordinates derived from real distance measurements

In my program (using MATLAB), I specified(through dragging) the pedestrian lane as my Region Of Interest (ROI) with the coordinates [7, 178, 620, 190] (in xmin, ymin, width, and height respectively) using the getrect, roipoly and insertshape function. Refer to the image below.

The video from where this snapshot is taken is in 640x480 pixels resolution (480p) enter image description here


Defining a real world space as my ROI by mouse dragging is barbaric. That's why the ROI coordinates must be derived mathematically.

What I'm going at is using real-world measurements from the video capturing site and use the Pythagorean Theorem from where the camera is positioned:

enter image description here

How do I obtain the equivalent pixel coordinates and parameters using the real-world measurements?

Upvotes: 0

Views: 954

Answers (1)

marcoresk
marcoresk

Reputation: 1955

I'll try to split your question into 2 smaller questions.

A) How do I obtain the equivalent pixel coordinates of an interesting point? (pratical question)

Your program shoudl be able to retrieve/reconnaise a feature/marker that you positioned in the "real-world" interesting point. The output is a coordinate in pixel. This can be done quite easily (think about QR-codes, for example)

B) What is the analytical relationship between 1 point in 3D space and its pixel coordinate in the image? (theoretical question)

This is the projection equation based on the pinhole camera model. X,Y,Z 3D coordinates are related with x,y pixel coordinates projection equation

Cool, but some detail have to be explained (and there will be any "automatic short formula")

  1. s represent the scale factor. A single pixel in an image could be the projection of infinite different point, due to perspective. In your photo, a pixel containing a piece of a car (when the car is present) will be the same pixel that contain a piece of street under the car (when the car is passed). So there is not an univocal relationship starting from pixels coordinates

  2. The matrix on the left involves the camera parameters (focal length, etc.) which are called intrinsic parameters. They have to be known to build the relationship between 3D coordinates and pixel coordinates

  3. The matrix on the right seems to be trivial, is the combination of an identity matrix which represents rotation and a column array of zeros which represents translation. Something like T = [R|t]. Which rotation, which translation? You have to consider that every set of coordinates is implicitly expressed in its own reference system. So you have to determine the relationship between the reference system of your measurement and the camera reference system: not only to retrieve position of the camera in your 3D space with euclidean geometry, but also orientation of the camera (angles).

Upvotes: 2

Related Questions