Reputation: 298
I have a dataset of coordinates and in each row there are the following columns:
I'm trying to find out if there are areas that have worse coordinates than others. The way I've achieved this is to use a scatterplot and use the hue to show the accuracy. The higher the accuracy value the worse it is, therefore darker spots in the plot mean that they have worse accuracies overall.
This technically solves my problem, but the thing is that it takes ages to compute (as there are over 800 thousand rows) and I'm not sure that it's the best way to achieve what I want. I've used 2d histograms with this dataset and they work like a charm and are super quick. The issue is that they are always colored by the density and I was wondering if it was possible to color the histogram by the average accuracy value of the bin instead.
If there are other solutions that would fit this problem then I'm also all ears. I only mentioned these two because they are the only ones I can think of.
Upvotes: 1
Views: 494
Reputation: 298
I've found a way to do it, it's still not perfect but it fits my needs.
The idea is to actually manually bin the data using pandas:
coordinates['x_bin'] = pd.cut(coordinates['x_coordinate'], bins=30)
coordinates['y_bin'] = pd.cut(coordinates['y_coordinate'], bins=30)
Then I group by those two and plot using a heatmap like this:
grouped = coordinates.groupby(['x_bin', 'y_bin'], as_index=False)['accuracy'].mean()
data = grouped.pivot('y_bin', 'x_bin', 'accuracy')
fig, ax = plt.subplots(figsize=(20,10))
sns.heatmap(data, ax=ax, cmap=sns.cm.mako_r)
ax.invert_yaxis()
The resulting plot looks like the following:
Upvotes: 1