Ekaterina Burakova
Ekaterina Burakova

Reputation: 45

Point-wise bin number and bin height for XY data (using Python)

In a 2D histogram plot of an XY distribution, how do I know which bin number and what bin height corresponds to each point?

How to properly visualize the result (preferably, with seaborn)?

Upvotes: 0

Views: 185

Answers (1)

Ekaterina Burakova
Ekaterina Burakova

Reputation: 45

So, I want to create a plot where my x, y data point would be superimposed with a histogram calculated with numpy.histogram2d.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

np.random.seed(9)

x = np.round(10*np.random.rand(12), 1)
y = np.round(10*np.random.rand(12), 1)

binrange=([x.min(), x.max()+1], [y.min(), y.max()+1]

h, ex, ey = np.histogram2d(x, y, bins=5, range=binrange), density=False)
nx = np.digitize(x, bins=ex)
ny = np.digitize(y, bins=ey)

print('Why do my points fall into empty bins??')
print('Values:', '\n', x, '\n', y, '\n')
print('Bins', '\n', ex, '\n', ey, '\n')
print('Bin numbers:\n', nx, '\n', ny, '\n')

sns.histplot(x=x, y=y, bins=5, binrange=binrange), cbar=True)
sns.scatterplot(x=x, y=y, s=15, color='k')
plt.suptitle('What I expect to see')

Output:

 Values: 
 [0.1 5.  5.  1.3 1.4 2.2 4.2 2.5 0.8 3.5 1.7 8.8] 
 [9.5 0.4 7.  5.7 9.  6.7 5.5 7.  3.9 6.9 8.2 4.7] 

Bins 
 [0.1  2.04 3.98 5.92 7.86 9.8 ] 
 [ 0.4   2.42  4.44  6.46  8.48 10.5 ] 

Bin numbers:
 [1 3 3 1 1 2 3 2 1 2 1 5] 
 [5 1 4 3 5 4 3 4 2 4 4 3] 

expected plot

A small trick here is to correctly rotate the calculated histogram with np.rot90:

plt.imshow(np.rot90(h, 1), 
           extent=[x.min(), x.max()+1, y.min(), y.max()+1], origin='upper', cmap='Blues')
plt.colorbar()
plt.scatter(x=x, y=y, s=10, color='k')

enter image description here

Thus, the problem is almost solved. However, it requires a bit more to make the last plot with sns.heatmap. The major problem there is to somehow set the extent to the axis. Alternatively, we can scale the original data to the limits (0, number_of_bins).

For example:

def transform(distrA, limitsA, limitsB):
    '''Transforms distribution of unevenly distributed points in a space A to space B"
    Input:
    distrA - numpy 2D array [[arrdim1 ...], [arrdim2 ...], [arrdim3 ...], [arrdim4 ...]] - 
             Distribution to be transformed.
    limitsA and limitsB - (array of pairs) - 
             Limits of space A and B, correspondingly, in the form (lower, higher)
        
    Output:
    distrB - transformed distribution'''
    
    shape=distrA.shape
    distrB = np.empty(shape=distrA.shape)
    for i in range(shape[0]):
        spanA = limitsA[i][1] - limitsA[i][0]
        spanB = limitsB[i][1] - limitsB[i][0]
        for j in range(shape[1]):  
            distrB[i, j] = spanB * (distrA[i, j]-limitsA[i][0]) / spanA + limitsB[i][0]
        
    return distrB

hm=sns.heatmap(np.rot90(h, 1), cmap='Blues', annot=True)
h_trans=transform(np.asarray([x, y]), 
                  [[x.min(), x.max()+1], [y.min(), y.max()+1]], 
                  ((0,5), (5,0))
                  )

sns.scatterplot(x=h_trans[0], y=h_trans[1], s=20, color='k')
plt.title('Desired seaborn heatmap')

Desired sns map

Upvotes: 2

Related Questions