Barbara Morgan
Barbara Morgan

Reputation: 31

Improve Histogram

Ive made this method for getting me the pixel values of an image, im using it to compare 1 image against 50 other images. However it takes forever to produce outputs. Does anyone know of a way l can speed this method up? Would converting the images to Grayscale be a quicker way? If anyone could help with code, that would be great!

public static double[] GetHistogram (BufferedImage img) {
    double[] myHistogram = new double [16777216];

     for (int y = 0; y < img.getHeight(); y += 1)
     {
         for (int x = 0; x < img.getWidth(); x += 1) 
         {
              int clr =  img.getRGB(x,y); 

              Color c = new Color(img.getRGB(x, y));

              int pixelIntValue = (int) c.getBlue() * 65536 + c.getGreen() * 256 + c.getRed();

             myHistogram[pixelIntValue]++;

         }

     }
     return myHistogram;
 }

Upvotes: 3

Views: 873

Answers (2)

Matthew Pope
Matthew Pope

Reputation: 7669

TLDR: use a smaller image and read this paper.

You should try to eliminate any unnecessary function calls as @Piglet mentioned, but you should definitely keep the colors in one histogram instead of a separate histogram for R, G, and B. Aside from getting rid of the extra function calls, I think there are four things you can do to speed up your algorithm—both creating and comparing the histograms—and reduce the memory usage (because less page caching means less disk thrashing and more speed).


Use a smaller image

One of the advantages of color histogram indexing is that it is relatively independent of resolution. The color of an object does not change with the size of the image. Obviously, there are limits to this—imagine trying to match objects using a 1×1 image. However, if your images have millions of pixels (like the images from most smart phones these days), you should definitely resize it. These authors found that an image resolution of only 16×11 still produced very good results [see page 17], but even resizing down to ~100×100 pixels should still provide a significant speed-up.

BufferedImage inherits the method getScaledInstance from Image, which you can use to get a smaller image.

double scalingFactor = 0.25; //You need to choose this value to work with your images
int aSmallHeight = myBigImage.getHeight() * scalingFactor;
int aSmallWidth = myBigImage.getWidth() * scalingFactor;
Image smallerImage = myBigImage.getScaledInstance(aSmallWidth, aSmallHeight, SCALE_FAST);

Reducing your image size is the single most effective thing you can do to speed up your algorithm. If you do nothing else, at least do this.


Use less information from each color channel

This won't make as much difference for generating your histograms because it will actually require a little more computation, but it will dramatically speed up comparing the histograms. The general idea is called quantization. Basically, if you have red values in the range 0..255, they can be represented as one byte. Within that byte, some bits are more important than others.

Consider this color sample image. I placed a mostly arbitrary shade of red in the top left, and in each of the other corners, I ignored one or more bits in the red channel (indicated by the underscores in the color byte). I intentionally chose a color with lots of one bits in it so that I could show the "worst" case of ignoring a bit. (The "best" case, when we ignore a zero bit, has no effect on the color.)

Results of dropping different bits in the R channel

There's not much difference the upper right and upper left corners, even though we ignored one bit. The upper left and lower left have a visible, but minimal difference even though we ignored 3 bits. The Upper left and lower right corners are very different even though we ignored only one bit because it was the most significant bit. By strategically ignoring less significant bits, you can reduce the size of your histogram, which means there's less for the JVM to move around and fewer bins when it comes time to compare them.

Here are some solid numbers. Currently, you have 28×28×28 = 16777216 bins. If you ignore the 3 least significant bits from each color channel, you will get 25×25×25 = 32768 bins, which is 1/512 of the number of bins you are currently using. You may need to experiment with your set of images to see what level of quantization still produces acceptable results.

Quantization is very simple to implement. You can just ignore the rightmost bits by performing the bit shift operations.

int numBits = 3;
int quantizedRed = pixelColor.getRed() >> numBits;
int quantizedGreen = pixelColor.getGreen() >> numBits;
int quantizedBlue = pixelColor.getBlue() >> numBits;


Use a different color space

While grayscale might be quicker, you should not use grayscale because you lose all of your color information that way. When you're matching objects using color histograms, the actual hue or chromaticity is more important than how light or dark something is. (One reason for this is because the lighting intensity can vary across an image or even between images.) There are other representations of color that you could use that don't require you to use 3 color channels.

For example, L*a*b* (see also this) uses one channel (L) to encode the brightness, and two channels (a, b) to encode color. The a and b channels each range from -100 to 100, so if you create a histogram using only a and b, you would only need 40000 bins. The disadvantage of a histogram of only a and b is that you lose the ability to record black and white pixels. Other color spaces each have their own advantages and disadvantages for your algorithm.

It is generally not very difficult to convert between color spaces because there are many existing implementations of color space conversion functions that are freely available on the internet. For example, here is a Java conversion from RGB to L*a*b*.

If you do choose to use a different color space, be careful using quantization as well. You should apply any quantization after you do the color space conversion, and you will need to test different quantization levels because the new color space might be more or less sensitive to quantization than RGB. My preference would be to leave the image in RGB because quantization is already so effective at reducing the number of bins.


Use different data types

I did some investigating, and I notices that BufferedImage stores the image as a Raster, which uses a SampleModel to describe how pixels are stored in the data buffer. This means there is a lot of overhead just to retrieve the value of one pixel. You will achieve faster results if your image is stored as byte[] or int[]. You can get the byte array using

byte[] pixels = ((DataBufferByte) bufferedImage.getRaster().getDataBuffer()).getData();

See the answer to this previous question for more information and some sample code to convert it to a 2D array.

This last thing might not make much difference, but I noticed that you are using double for storing your histogram. You should consider whether int would work instead. In Java, int has a maximum value of > 2 billion, so overflow shouldn't be an issue (unless you are making a histogram of an image with more than 2 billion pixels, in which case, see my first point). An int uses only half as much memory as a double (which is a big deal when you have thousands or millions of histogram bins), and for many math operations they can be faster (though this depends on your hardware).


If you want to read more about color histograms for object matching, go straight to the source and read Swain and Ballard's Color Indexing paper from 1991.

Upvotes: 3

Piglet
Piglet

Reputation: 28950

Calculating a histogram with 16777216 classes is quite unusual. Most histograms are calculated for each channel separately resulting in a 256 class histogram each for R,G and B. Or just one if you convert the image to grayscale.

I am no expert in Java. I don't know how clever the compilers optimize code. But you call img.getHeight() for every row and img.getWidth() for every column of your image. I don't know how often those expressions are actually evaluated but maybe you can save some processing time if you just use 2 variables that you assign the width and height of your image to befor you start your loops.

You also call img.getRGB(x,y) twice for every pixel. Same story. Maybe it is faster to just do it once. Function calls are usually slower than reading variables from memory.

You should also think about what you are doing here. img.getRGB(x,y) gives you an integer representation for a color. Then you put that integer into a contrustor to make a Color object out of it. Then you use c.getBlue() and so on to get integer values for red, green and blue out of that Color object. Just to put it together into a integer again?

You could just use the return value of getRGB straight away and at least save 4 function calls, 3 multiplications, 3 summations...

So again given that I programmed Java for the last time like 10 years ago my function would look more like that:

public static double[] GetHistogram (BufferedImage img) {

double[] myHistogram = new double [16777216];

 int width = img.getWidth()
 int height = img.getHeight()
 for (int y = 0; y < height; y += 1)
 {
     for (int x = 0; x < width; x += 1) 
     {
          int clr =  img.getRGB(x,y);  

         myHistogram[clr]++;

     }

 }
 return myHistogram;

}

Of course the array type and size won't be correct and that whole 16777216 class histogram doesn't make sense but maybe that helps you a bit to speed things up. I'd just use a bit mask to get the red, green and blue values out of that integer and create three histograms.

Upvotes: 1

Related Questions