Reputation: 11640
Although there are many matplotlib optimization posts around, I didn't find the exact tips I want here, such as: Matplotlib slow with large data sets, how to enable decimation?
Matplotlib - Fast way to create many subplots?
My problem is that I have cached CSV files of time-series data (40 of them). I'd like to plot them in one plot with 40 subplots in a vertical series, and output them to a single rasterized image.
My code using matplotlib is as follows:
def _Draw(self):
"""Output a graph of subplots."""
BigFont = 10
# Prepare subplots.
nFiles = len(self.inFiles)
fig = plt.figure()
plt.axis('off')
for i, f in enumerate(self.inFiles[0:3]):
pltTitle = '{}:{}'.format(i, f)
colorFile = self._GenerateOutpath(f, '_rgb.csv')
data = np.loadtxt(colorFile, delimiter=Separator)
nRows = data.shape[0]
ind = np.arange(nRows)
vals = np.ones((nRows, 1))
ax = fig.add_subplot(nFiles, 1, i+1)
ax.set_title(pltTitle, fontsize=BigFont, loc='left')
ax.axis('off')
ax.bar(ind, vals, width=1.0, edgecolor='none', color=data)
figout = plt.gcf()
plt.savefig(self.args.outFile, dpi=300, bbox_inches='tight')
The script hangs for the whole night. On average my data are all ~10,000 x 3 to ~30,000 x 3 matrix.
In my case, I don't think I can use memmapfile to avoid memory hog because the subplot seems to be the problem here, not the data imported each loop.
I have no idea where to start to optimize this workflow. I could, however, forget about subplots and generate one plot image per data at a time, and stitch the 40 images later, but that is not ideal.
Is there an easy way in matplotlib to do this?
Upvotes: 1
Views: 747
Reputation: 284750
Your problem is the way you're plotting your data.
Using bar
to plot tens of thousands of bars of exactly the same size is very inefficient compared to using imshow
to accomplish the same thing.
For example:
import numpy as np
import matplotlib.pyplot as plt
# Random r,g,b data similar to what you seem to be loading in....
data = np.random.random((30000, 3))
# Make data a 1 x size x 3 array
data = data[None, ...]
# Plotting using `imshow` instead of `bar` will be _much_ faster.
fig, ax = plt.subplots()
ax.imshow(data, interpolation='nearest', aspect='auto')
plt.show()
This should be essentially equivalent to what you're currently doing, but will draw much faster and use less memory.
Upvotes: 2