Reputation: 51
Hello dear Community,
I haven't found something similar during my search and hope I haven't overseen anything. I have the following issue:
I have a big dataset whichs shape is 1352x121797 (1353 samples and 121797 time points). Now I have clustered these and would like to generate one plot for each cluster in which every time series for this cluster is plotted.
However, when using the matplotlib syntax it is like super extremely slow (and I'm not exactly sure where that comes from). Even after 5-10 minutes it hasn't finished.
import matplotlib.pyplot as plt
import pandas as pd
fig, ax = plt.subplots()
for index, values in subset_cluster.iterrows(): # One Cluster subset, dataframe of shape (11x121797)
ax.plot(values)
fig.savefig('test.png')
Even, when inserting a break after ax.plot(values)
it still doesn't finish. I'm using Spyder and thought that it might be due to Spyder always rendering the plot inline in the console.
However, when simply using the pandas method of the Series values.plot()
instead of ax.plot(values)
the plot appears and is saved in like 1-2 seconds.
As I need the customization options of matplotlib for standardizing all the plots and make them look a little bit prettier I would love to use the matplotlib syntax. Anyone has any ideas?
Thanks in advance
Edit: so while trying around a little bit it seems, that the rendering is the time-consuming part. When ran with the backend matplotlib.use('Agg')
, the plot command runs through quicker (if using plt.plot()
instead of ax.plot()
), but plt.savefig()
then takes forever. However, still it should be in a considerable amount of time right? Even for 121xxx data points.
Upvotes: 4
Views: 3322
Reputation: 268
Posting as answer as it may help OP or someone else: I had the same problem and found out that it was because the data I was using as x-axis was an Object, while the y-axis data was float64. After explicitly setting the object to DateTime, plotting With Matplotlib went as fast as Pandas' df.plot(). I guess that Pandas does a better job at understanding the data type when plotting.
OP, you might want to check if the values you are plotting are in the right type, or if, like me, you had some problems when loading the dataframe from file.
Upvotes: 4