imagealgorithmdata-structurescompression

Reputation: 11251

Depth Image compression up to a maximum permitted error

Articles on image compression often focus on generating the best possible image quality (PSNR) given a fixed compression ratio. I'm curious about getting the best possible compression ratio given a maximum permissible per-pixel error. My instinct is to greedily remove the smallest coefficients in the transformed data, keep track of the error I've caused, until I can't remove any more without passing the maximum error. But I can't find any papers to confirm it. Can anyone point me to a reference about this problem?

edit

Let me give some more details. I'm trying to compress depth images from 3D scanners, not regular images. Color is not a factor. Depth images tend to have large smooth patches, but accurate discontinuities are important. Some pixels will be empty - outside the scanner's range or low confidence level - and not require compression.

enter image description here

The algorithm will need to run fast - optimally at 30 fps like the Microsoft Kinect, or at least somewhere in the 100 millisecond area. The algorithm will be included in a library I distribute. I prefer to minimize dependencies, so compression schemes that I can implement myself in a reasonably small amount of code are preferable.

Upvotes: 10

Answers (7)

Kaganar

Reputation: 6570

This answer won't satisfy your request for references, but it's too long to post as a comment.

First, depth buffer compression for computer generated imagery may apply to your case. Usually this compression is done at the hardware level with a transparent interface, so it's typically designed to be simple and fast. Given this, it may be worth your while to search for depth buffer compression.

One of the major issues you're going to have with transform-based compressors (DCTs, Wavelets, etc...) is that there's no easy way to find compact coefficients that meet your hard maximum error criteria. (The problem you end up with looks a lot like linear programming. Wavelets can have localized behavior in most of their basis vectors which can help somewhat, but it's still rather inconvenient.) To achieve the accuracy you desire you may need to add on another refinement step but this will also add more computation time, complexity, and will introduce another layer imperfect entropy coding leading to a loss of compression efficiency.

What you want is more akin to lossless compression than lossy compression. In this light, one approach would be to simply throw away the bits under your error threshold: if your maximum allowable error is X and your depths are represented as integers, integer-divide your depths by X and then apply lossless compression.

Another issue you're facing is the representation of depth -- depending on your circumstances it may be a floating point number, an integer, it may be in a projective coordinate system, or even more bizarre.

Given these restrictions, I recommend a scheme like BTPC as it allows for a more easily adapted wavelet-like scheme where errors are more clearly localized and easier to understand and account for. Additionally, BTPC has shown a great resistance to many types of images and a good ability to handle continuous gradients and sharp edges with low loss of fidelity -- exactly the sorts of traits you're looking for.

Since BTPC is predictive, it doesn't matter particularly how your depth format is stored -- you just need to modify your predictor to take your coordinate system and numeric type (integer vs. floating) into account.

Since BTPC doesn't do terribly much math, it can run pretty fast on general CPUs, too, although it may not be as easy to vectorize as you'd like. (It sounds like you're possibly doing low level optimized game programming, so this may be a serious consideration for you.)

If you're looking for something simpler to implement I'd recommend a "filter" type of approach (similar to PNG) with a Golomb-Rice coder strapped on. Rather than coding the deltas perfectly to end up with lossless compression, you can code to a "good enough" degree. The advantage of doing this as compared to a quantize-then-lossless-encode style compressor is that you can potentially maintain more continuity.

Upvotes: 1

maniek

Reputation: 7307

I'd try preprocessing the image, then compressing with a general method, such as PNG.

Preprocessing for PNG (first read this)

for y in 1..height
  for x in 1..width
    if(abs(A[y][x-1] - A[y][x]) < threshold)
       A[y][x] = A[y][x-1]
    elsif (abs(A[y-1][x] - A[y][x]) < threshold)
       A[y][x] = A[y-1][x]

Upvotes: 0

iampat

Reputation: 1101

I think you are looking for something like JPEG-LS algorithm, which tries to limit the maximum amount pixel error. Albeit, it is mainly designed for compression of natural or medical images and is not well designed for depth images ( which are smoother).

The term "near-lossless compression" refers to a lossy algorithm for which each reconstructed image sample differs from the corresponding original image sample by not more than a pre-specified value, the (usually small) "loss." Lossless compression corresponds to loss=0.link to the original reference

Upvotes: 1

Edward Dixon

Reputation: 601

Iteratively evaluating different sets of coefficients will not help your goal of being able to compress frames as quickly as they are generated, and will not help you to keep complexity low.

Depth maps are different from intensity maps in several ways that can help you.

Large areas of "no data" can be handled very efficiently by run-length encoding.
Measurement error in intensity images is constant across the image after fixed-noise has been subtracted, but depth maps from both Kinects and stereo vision systems have errors that increase as an inverse function of depth. If these are the scanners you are targeting then you can use lossier compression for closer pixels - because the errors your lossy function introduces are independent of sensor error, the total error won't be increased until your lossy function's error is greater than the sensor error.
A team at Microsoft had a lot of success with a very low-loss algorithm that relied heavily on run-length encoding (see paper here), beating out JPEG 2000 with better compression and excellent performance; however, part of their success seemed to step from the relatively crude depth maps their sensor produces. If you are targeting Kinects, you may find it hard to improve on their method.

Upvotes: 1

distantTransformer

Reputation: 119

I think you are pretty close to the solution, but there are an issue that i think you should pay some attention. Because different wavelet coefficients are corresponding to functions with different scale (and shift), so error being introduced by elimination of a partricular coefficient depends not only on it value, but on it position (especially scale), so the weight of the coefficient should be something like w(c) = amp(c) * F(scale, shift) where amp(c) is an amplitude of coefficient and F is a function that depends on the compressed data. When you determine weigths like that the problem is reduced to the backpack problem, that could be solved in many ways (for example reorder the coefficients and eliminate tha smallest one until you get a threshold error on a pixel affected by the corresponding function). The hard part is to determine F(scale,shift). You can do it in the following way. If the data that you are compressing is relatively stable (for example surveillance video), you could estimate F as a middle probability of recieve an unacceptable error eliminating the component with given scale and shift from the wavelet decomposition. So you could perform SVD (or PCA) decomposition on historical data and calculate 'F(scale, shift)' as a weighted (with weights equal to eigenvalues) summ of scalar products of the component with given scale and shift to the eigenvectors F(scale,shift) = summ eValue(i) * (w(scale,shift) * eVector(i)) where eValue is eigenvalue corresponding to the eigenvector - eVector(i), w(scale,shift) is a wavelet function with given scale and shift.

Upvotes: 1

mkuse

Reputation: 2508

I am not aware of any references in case of the problem you have proposed.

However, one direction in which I can think is using optimization techniques to select best coefficients. Techniques like genetic algorithms, hill climbing, simulated annihilation can be used in this regard.

Given that I have experience in genetic algorithms, I can suggest the following process. If you are not aware about genetic algorithm, I recommend you to read up the wiki page on genetic algorithms.

Your problem can be thought of has selecting a subset of coefficients which give the minimum reconstruction error. Say there are N coefficients. It is easy to establish that there are 2^N subsets. Each subset can be represented by a string on N binary numbers. For example, for N=5, the string 11101 represents that the selected subset contains all the coeff except the coeff4. With genetic algorithms it is possible to find an optimum bit sting. The objective function can be chosen as the absolute error between the reconstructed and the original signals. However, I am aware that you can get an error of zero when all the coeffs are taken.

To get around this problem, you may choose to modulate the objective function with an appropriate function which discourages objective function value near zero and is a monotonically increasing function after a threshold. A function like | log( \epsion + f ) | may suffice.

If what I propose seem interesting to you do let me know. I have an implementation of genetic algorithm with me. But it is tailored to my needs and you might not be in a position to adapt it for this problem. I am willing to work with you on this problem as the problem seem interesting to explore.

Do let me know.

Upvotes: 1

titus

Reputation: 5784

"greedily remove the smallest coefficients" reminds me of SVD compression, where you use the data associated with the first k largest eigenvalues to approximate the data. The rest of the eigenvalues that are small don't hold significant information and can be discarded.
Large k -> high quality, low compression
Small k -> lower quality, high compression

(disclaimer: I have no idea what I'm talking here but it might help)

edit:
here is a better illustration of SVD compression

Upvotes: 1

Depth Image compression up to a maximum permitted error

edit

Answers (7)

Related Questions