Reputation: 9844
I have a problem in which a have a bunch of images for which I have to generate histograms. But I have to generate an histogram for each pixel. I.e, for a collection of n images, I have to count the values that the pixel 0,0 assumed and generate an histogram, the same for 0,1, 0,2 and so on. I coded the following method to do this:
class ImageData:
def generate_pixel_histogram(self, images, bins):
"""
Generate a histogram of the image for each pixel, counting
the values assumed for each pixel in a specified bins
"""
max_value = 0.0
min_value = 0.0
for i in range(len(images)):
image = images[i]
max_entry = max(max(p[1:]) for p in image.data)
min_entry = min(min(p[1:]) for p in image.data)
if max_entry > max_value:
max_value = max_entry
if min_entry < min_value:
min_value = min_entry
interval_size = (math.fabs(min_value) + math.fabs(max_value))/bins
for x in range(self.width):
for y in range(self.height):
pixel_histogram = {}
for i in range(bins+1):
key = round(min_value+(i*interval_size), 2)
pixel_histogram[key] = 0.0
for i in range(len(images)):
image = images[i]
value = round(Utils.get_bin(image.data[x][y], interval_size), 2)
pixel_histogram[value] += 1.0/len(images)
self.data[x][y] = pixel_histogram
Where each position of a matrix store a dictionary representing an histogram. But, how I do this for each pixel, and this calculus take a considerable time, this seems to me to be a good problem to be parallelized. But I don't have experience with this and I don't know how to do this.
EDIT:
I tried what @Eelco Hoogendoorn told me and it works perfectly. But applying it to my code, where the data are a large number of images generated with this constructor (after the values are calculated and not just 0 anymore), I just got as h an array of zeros [0 0 0]. What I pass to the histogram method is an array of ImageData.
class ImageData(object):
def __init__(self, width=5, height=5, range_min=-1, range_max=1):
"""
The ImageData constructor
"""
self.width = width
self.height = height
#The values range each pixel can assume
self.range_min = range_min
self.range_max = range_max
self.data = np.arange(width*height).reshape(height, width)
#Another class, just the method here
def generate_pixel_histogram(realizations, bins):
"""
Generate a histogram of the image for each pixel, counting
the values assumed for each pixel in a specified bins
"""
data = np.array([image.data for image in realizations])
min_max_range = data.min(), data.max()+1
bin_boundaries = np.empty(bins+1)
# Function to wrap np.histogram, passing on only the first return value
def hist(pixel):
h, b = np.histogram(pixel, bins=bins, range=min_max_range)
bin_boundaries[:] = b
return h
# Apply this for each pixel
hist_data = np.apply_along_axis(hist, 0, data)
print hist_data
print bin_boundaries
Now I get:
hist_data = np.apply_along_axis(hist, 0, data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 104, in apply_along_axis
outshape[axis] = len(res)
TypeError: object of type 'NoneType' has no len()
Any help would be appreciated. Thanks in advance.
Upvotes: 1
Views: 1536
Reputation: 10759
As noted by john, the most obvious solution to this is to look for library functionality that will do this for you. It exists, and it will be orders of magnitude more efficient than what you are doing here.
Standard numpy has a histogram function that can be used for this purpose. If you have only few values per pixel, it will be relatively inefficient; and it creates a dense histogram vector rather than the sparse one you produce here. Still, chances are good the code below solves your problem efficiently.
import numpy as np
#some example data; 128 images of 4x4 pixels
voxeldata = np.random.randint(0,100, (128, 4,4))
#we need to apply the same binning range to each pixel to get sensibble output
globalminmax = voxeldata.min(), voxeldata.max()+1
#number of output bins
bins = 20
bin_boundaries = np.empty(bins+1)
#function to wrap np.histogram, passing on only the first return value
def hist(pixel):
h, b = np.histogram(pixel, bins=bins, range=globalminmax)
bin_boundaries[:] = b #simply overwrite; result should be identical each time
return h
#apply this for each pixel
histdata = np.apply_along_axis(hist, 0, voxeldata)
print bin_boundaries
print histdata[:,0,0] #print the histogram of an arbitrary pixel
But the more general message id like to convey, looking at your code sample and the type of problem you are working on: do yourself a favor, and learn numpy.
Upvotes: 2
Reputation: 1690
Parallelization certainly would not be my first port of call in optimizing this kind of thing. Your main problem is that you're doing lots of looping at the Python level. Python is inherently slow at this kind of thing. One option would be to learn how to write Cython extensions and write the histogram bit in Cython. This might take you a while. Actually, taking a histogram of pixel values is a very common task in computer vision and it has already been efficiently implemented in OpenCV (which has python wrappers). There are also several functions for taking histograms in the numpy python package (though they are slower than the OpenCV implementations).
Upvotes: 1