pceccon
pceccon

Reputation: 9844

Numpy Histogram - Python

I have a problem in which a have a bunch of images for which I have to generate histograms. But I have to generate an histogram for each pixel. I.e, for a collection of n images, I have to count the values that the pixel 0,0 assumed and generate an histogram, the same for 0,1, 0,2 and so on. I coded the following method to do this:

class ImageData:
    def generate_pixel_histogram(self, images, bins):
    """
    Generate a histogram of the image for each pixel, counting
    the values assumed for each pixel in a specified bins
    """
        max_value = 0.0
        min_value = 0.0
        for i in range(len(images)):
            image = images[i]
            max_entry = max(max(p[1:]) for p in image.data)
            min_entry = min(min(p[1:]) for p in image.data)
            if max_entry > max_value:
                max_value = max_entry
            if min_entry < min_value:
                min_value = min_entry

        interval_size = (math.fabs(min_value) + math.fabs(max_value))/bins

        for x in range(self.width):
            for y in range(self.height):
                pixel_histogram = {}
                for i in range(bins+1):
                    key = round(min_value+(i*interval_size), 2)
                    pixel_histogram[key] = 0.0
                for i in range(len(images)):
                    image = images[i]
                    value = round(Utils.get_bin(image.data[x][y], interval_size), 2)
                    pixel_histogram[value] += 1.0/len(images)
                self.data[x][y] = pixel_histogram    

Where each position of a matrix store a dictionary representing an histogram. But, how I do this for each pixel, and this calculus take a considerable time, this seems to me to be a good problem to be parallelized. But I don't have experience with this and I don't know how to do this.

EDIT:

I tried what @Eelco Hoogendoorn told me and it works perfectly. But applying it to my code, where the data are a large number of images generated with this constructor (after the values are calculated and not just 0 anymore), I just got as h an array of zeros [0 0 0]. What I pass to the histogram method is an array of ImageData.

class ImageData(object):

    def __init__(self, width=5, height=5, range_min=-1, range_max=1):
        """
        The ImageData constructor
        """
        self.width = width
        self.height = height
        #The values range each pixel can assume
        self.range_min = range_min
        self.range_max = range_max
        self.data = np.arange(width*height).reshape(height, width)

#Another class, just the method here
def generate_pixel_histogram(realizations, bins):
    """
    Generate a histogram of the image for each pixel, counting
    the values assumed for each pixel in a specified bins
    """
    data = np.array([image.data for image in realizations])
    min_max_range = data.min(), data.max()+1

    bin_boundaries = np.empty(bins+1)

    # Function to wrap np.histogram, passing on only the first return value
    def hist(pixel):
        h, b = np.histogram(pixel, bins=bins, range=min_max_range)
        bin_boundaries[:] = b
        return h

    # Apply this for each pixel
    hist_data = np.apply_along_axis(hist, 0, data)
    print hist_data
    print bin_boundaries

Now I get:

  hist_data = np.apply_along_axis(hist, 0, data)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 104, in apply_along_axis
  outshape[axis] = len(res)
  TypeError: object of type 'NoneType' has no len()

Any help would be appreciated. Thanks in advance.

Upvotes: 1

Views: 1536

Answers (2)

Eelco Hoogendoorn
Eelco Hoogendoorn

Reputation: 10759

As noted by john, the most obvious solution to this is to look for library functionality that will do this for you. It exists, and it will be orders of magnitude more efficient than what you are doing here.

Standard numpy has a histogram function that can be used for this purpose. If you have only few values per pixel, it will be relatively inefficient; and it creates a dense histogram vector rather than the sparse one you produce here. Still, chances are good the code below solves your problem efficiently.

import numpy as np
#some example data; 128 images of 4x4 pixels
voxeldata = np.random.randint(0,100, (128, 4,4))
#we need to apply the same binning range to each pixel to get sensibble output
globalminmax = voxeldata.min(), voxeldata.max()+1
#number of output bins
bins = 20
bin_boundaries = np.empty(bins+1)
#function to wrap np.histogram, passing on only the first return value
def hist(pixel):
    h, b = np.histogram(pixel, bins=bins, range=globalminmax)
    bin_boundaries[:] = b  #simply overwrite; result should be identical each time
    return h
#apply this for each pixel
histdata = np.apply_along_axis(hist, 0, voxeldata)
print bin_boundaries
print histdata[:,0,0]  #print the histogram of an arbitrary pixel

But the more general message id like to convey, looking at your code sample and the type of problem you are working on: do yourself a favor, and learn numpy.

Upvotes: 2

John Greenall
John Greenall

Reputation: 1690

Parallelization certainly would not be my first port of call in optimizing this kind of thing. Your main problem is that you're doing lots of looping at the Python level. Python is inherently slow at this kind of thing. One option would be to learn how to write Cython extensions and write the histogram bit in Cython. This might take you a while. Actually, taking a histogram of pixel values is a very common task in computer vision and it has already been efficiently implemented in OpenCV (which has python wrappers). There are also several functions for taking histograms in the numpy python package (though they are slower than the OpenCV implementations).

Upvotes: 1

Related Questions