Reputation: 45
In a 2D histogram plot of an XY distribution, how do I know which bin number and what bin height corresponds to each point?
How to properly visualize the result (preferably, with seaborn
)?
Upvotes: 0
Views: 185
Reputation: 45
So, I want to create a plot where my x, y data point would be superimposed with a histogram calculated with numpy.histogram2d
.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(9)
x = np.round(10*np.random.rand(12), 1)
y = np.round(10*np.random.rand(12), 1)
binrange=([x.min(), x.max()+1], [y.min(), y.max()+1]
h, ex, ey = np.histogram2d(x, y, bins=5, range=binrange), density=False)
nx = np.digitize(x, bins=ex)
ny = np.digitize(y, bins=ey)
print('Why do my points fall into empty bins??')
print('Values:', '\n', x, '\n', y, '\n')
print('Bins', '\n', ex, '\n', ey, '\n')
print('Bin numbers:\n', nx, '\n', ny, '\n')
sns.histplot(x=x, y=y, bins=5, binrange=binrange), cbar=True)
sns.scatterplot(x=x, y=y, s=15, color='k')
plt.suptitle('What I expect to see')
Output:
Values:
[0.1 5. 5. 1.3 1.4 2.2 4.2 2.5 0.8 3.5 1.7 8.8]
[9.5 0.4 7. 5.7 9. 6.7 5.5 7. 3.9 6.9 8.2 4.7]
Bins
[0.1 2.04 3.98 5.92 7.86 9.8 ]
[ 0.4 2.42 4.44 6.46 8.48 10.5 ]
Bin numbers:
[1 3 3 1 1 2 3 2 1 2 1 5]
[5 1 4 3 5 4 3 4 2 4 4 3]
A small trick here is to correctly rotate the calculated histogram with np.rot90
:
plt.imshow(np.rot90(h, 1),
extent=[x.min(), x.max()+1, y.min(), y.max()+1], origin='upper', cmap='Blues')
plt.colorbar()
plt.scatter(x=x, y=y, s=10, color='k')
Thus, the problem is almost solved. However, it requires a bit more to make the last plot with sns.heatmap
. The major problem there is to somehow set the extent to the axis. Alternatively, we can scale the original data to the limits (0, number_of_bins).
For example:
def transform(distrA, limitsA, limitsB):
'''Transforms distribution of unevenly distributed points in a space A to space B"
Input:
distrA - numpy 2D array [[arrdim1 ...], [arrdim2 ...], [arrdim3 ...], [arrdim4 ...]] -
Distribution to be transformed.
limitsA and limitsB - (array of pairs) -
Limits of space A and B, correspondingly, in the form (lower, higher)
Output:
distrB - transformed distribution'''
shape=distrA.shape
distrB = np.empty(shape=distrA.shape)
for i in range(shape[0]):
spanA = limitsA[i][1] - limitsA[i][0]
spanB = limitsB[i][1] - limitsB[i][0]
for j in range(shape[1]):
distrB[i, j] = spanB * (distrA[i, j]-limitsA[i][0]) / spanA + limitsB[i][0]
return distrB
hm=sns.heatmap(np.rot90(h, 1), cmap='Blues', annot=True)
h_trans=transform(np.asarray([x, y]),
[[x.min(), x.max()+1], [y.min(), y.max()+1]],
((0,5), (5,0))
)
sns.scatterplot(x=h_trans[0], y=h_trans[1], s=20, color='k')
plt.title('Desired seaborn heatmap')
Upvotes: 2