Reputation: 1344
I've got a bunch of plots to make with large numbers of points in them. When I try to do it with matplotlib it takes hours, which isn't convenient. What alternative approaches exist?
The relevant bit of my code is as follows, where the number of points for each feature could easily be 100,000:
marker = 'o'
s = 10
patches = []
import matplotlib.patches as mpatches
for feature, color in zip(features, colors):
for point, value in zip(tsne, df[feature].values):
try:
plt.scatter(point[0], point[1], alpha=value, facecolor=color, marker=marker, s=s, label=feature)
except:
pass
patches.append(mpatches.Rectangle((0, 0), 1, 1, fc=color))
plt.legend(patches, features, prop={'size': 15}, loc='center left', bbox_to_anchor=(1, 0.5))
plt.show();
Upvotes: 0
Views: 377
Reputation: 103
Running your inner loop:
for point, value in zip(tsne, df[feature].values):
try:
plt.scatter(point[0], point[1], alpha=value, facecolor=color, marker=marker, s=s, label=feature)
instead with 1d numpy arrays will definitely speed things up.
The inner loop could be replaced with something like:
x = tsne[:, 0] # is `tsne` an (n, 2) numpy array?
y = tsne[:, 1]
alpha_values = df[feature].values
try:
plt.scatter(x, y, alpha=alpha_values, facecolor=color, marker=marker, s=s, label=feature)
except:
pass
If things are still too slow for you, you could also switch over to datashading in Holoviews, but try removing the inner for loop first, since that is definitely slowing you down a lot.
Upvotes: 1