Reputation: 379
I want to find the probability distribution of two images so I can calculate KL Divergence.
I'm trying to figure out what probability distribution means in this sense. I've converted my images to grayscale, flattened them to a 1d array and plotted them as a histogram with bins = 256
imageone = imgGray.flatten() # array([0.64991451, 0.65775765, 0.66560078, ...,
imagetwo = imgGray2.flatten()
plt.hist(imageone, bins=256, label = 'image one')
plt.hist(imagetwo, bins=256, alpha = 0.5, label = 'image two')
plt.legend(loc='upper left')
My next step is to call the ks_2samp function from scikit to calculate the divergence, but I'm unclear what arguments to use.
A previous answer explained that we should take the "take the histogram of the image(in gray scale) and than divide the histogram values by the total number of pixels in the image. This will result in the probability to find a gray value in the image."
Ref: Can Kullback-Leibler be applied to compare two images?
But what do we mean by take the histogram values? How do I 'take' these values?
Might be overcomplicating things, but confused by this.
Upvotes: 0
Views: 1701
Reputation: 6532
The hist
function will return 3 values, the first of which is the values (i.e., number counts) in each histogram bin. If you pass the density=True
argument to hist
, these values will be the probability density in each bin. I.e.,:
prob1, _, _ = plt.hist(imageone, bins=256, density=True, label = 'image one')
prob2, _, _ = plt.hist(imagetwo, bins=256, density=True, alpha = 0.5, label = 'image two')
You can then calculate the KL divergence using the scipy entropy
function:
from scipy.stats import entropy
entropy(prob1, prob2)
Upvotes: 2