kakyo
kakyo

Reputation: 11640

Matplotlib slow when plotting pre-cached data into many subplots

Although there are many matplotlib optimization posts around, I didn't find the exact tips I want here, such as: Matplotlib slow with large data sets, how to enable decimation?

Matplotlib - Fast way to create many subplots?

My problem is that I have cached CSV files of time-series data (40 of them). I'd like to plot them in one plot with 40 subplots in a vertical series, and output them to a single rasterized image.

My code using matplotlib is as follows:

def _Draw(self):
    """Output a graph of subplots."""
    BigFont = 10
    # Prepare subplots.
    nFiles = len(self.inFiles)
    fig = plt.figure()
    plt.axis('off')
    for i, f in enumerate(self.inFiles[0:3]):
        pltTitle = '{}:{}'.format(i, f)
        colorFile = self._GenerateOutpath(f, '_rgb.csv')
        data = np.loadtxt(colorFile, delimiter=Separator)
        nRows = data.shape[0]
        ind = np.arange(nRows)
        vals = np.ones((nRows, 1))
        ax = fig.add_subplot(nFiles, 1, i+1)
        ax.set_title(pltTitle, fontsize=BigFont, loc='left')
        ax.axis('off')
        ax.bar(ind, vals, width=1.0, edgecolor='none', color=data)
    figout = plt.gcf()
    plt.savefig(self.args.outFile, dpi=300, bbox_inches='tight')

The script hangs for the whole night. On average my data are all ~10,000 x 3 to ~30,000 x 3 matrix.

In my case, I don't think I can use memmapfile to avoid memory hog because the subplot seems to be the problem here, not the data imported each loop.

I have no idea where to start to optimize this workflow. I could, however, forget about subplots and generate one plot image per data at a time, and stitch the 40 images later, but that is not ideal.

Is there an easy way in matplotlib to do this?

Upvotes: 1

Views: 747

Answers (1)

Joe Kington
Joe Kington

Reputation: 284750

Your problem is the way you're plotting your data.

Using bar to plot tens of thousands of bars of exactly the same size is very inefficient compared to using imshow to accomplish the same thing.

For example:

import numpy as np
import matplotlib.pyplot as plt

# Random r,g,b data similar to what you seem to be loading in....
data = np.random.random((30000, 3))

# Make data a 1 x size x 3 array
data = data[None, ...]

# Plotting using `imshow` instead of `bar` will be _much_ faster.
fig, ax = plt.subplots()
ax.imshow(data, interpolation='nearest', aspect='auto')
plt.show()

enter image description here

This should be essentially equivalent to what you're currently doing, but will draw much faster and use less memory.

Upvotes: 2

Related Questions