Reputation: 255
I have three columns like so:
x y Value
0.5 0.5 3
2.3 1.2 5
2.7 1.6 10
3.3 4.1 4
3.5 4.2 6
3.8 4.6 8
I want to bin columns x and y and find the mean of column Value. ie. the average would be found between points (3,3) and (4,4), which would equal to 4+6+8/3 = 6. So the output should be like so:
x_bin y_bin mean_value
0, 1 0, 1 3
0, 1 1, 2 0
0, 1 2, 3 0
0, 1 3, 4 0
1, 2 0, 1 0
1, 2 1, 2 0
1, 2 2, 3 0
1, 2 3, 4 0
2, 3 0, 1 0
2, 3 1, 2 7.5
2, 3 2, 3 0
2, 3 3, 4 0
3, 4 0, 1 0
3, 4 1, 2 0
3, 4 2, 3 0
3, 4 3, 4 6
Ideally, I would like the output in a format where I could plot this as a heatmap grid.
Thanks in advance.
Upvotes: 0
Views: 123
Reputation: 80329
np.histogram2d
is numpy's function to bin 2D data. By default, the result counts the points into each bin. With the weights=
parameter, the points are weighted, and those weights are summed. Dividing the summed weights by the counts gives the means.
Seaborn's sns.heatmap
can display these means and automatically annotate the cell values.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
x = [0.5, 2.3, 2.7, 3.3, 3.5, 3.8]
y = [0.5, 1.2, 1.6, 4.1, 4.2, 4.6]
value = [3, 5, 10, 4, 6, 8]
bins = (np.arange(6), np.arange(6))
sums, _, _ = np.histogram2d(x, y, bins=bins, weights=value)
counts, _, _ = np.histogram2d(x, y, bins=bins)
with np.errstate(divide='ignore', invalid='ignore'): # divide 0 by 0 results in NaN
means = sums / counts
sns.set_style('white')
ax = sns.heatmap(means, annot=True, fmt='.1f', cmap='turbo', vmin=0, vmax=10, square=True,
cbar=True, cbar_kws={'ticks': np.arange(11)})
ax.set_xticks(range(len(bins[0])))
ax.set_xticklabels(bins[0])
ax.set_yticks(range(len(bins[1])))
ax.set_yticklabels(bins[1])
ax.tick_params(labelrotation=0)
ax.grid(axis='both', color='0.3', clip_on=False)
ax.set_axisbelow(False)
plt.tight_layout()
plt.show()
The image shows the plots for:
bins = (np.arange(6), np.arange(6))
,bins = ([0, 2, 4, 6], [0, 2, 4, 6])
andbins = (np.arange(0, 5.0001, 0.5), np.arange(0, 5.0001, 0.5))
Upvotes: 1