How to speed up matplotlib, subplot plotting/drawing and saving?

Question

i want to draw a fairly small IoT-CSV-Dataset, about ~2gb. It has the following dimensions (~20.000, ~18.000). Each column should become a subplot, with it's own y axis. I use the following code to generate the picture:

times = pd.date_range('2012-10-01', periods=2000, freq='2min')
timeseries_array = np.array(times);
cols = random.sample(range(1, 2001), 2000)
values = []
for col in cols:
    values.append(random.sample(range(1,2001), 2000))

time = pd.DataFrame(data=timeseries_array, columns=['date'])
graph = pd.DataFrame(data=values, columns=cols, index=timeseries_array)

fig, axarr = plt.subplots(len(graph.columns), sharex=True, sharey=True, 
constrained_layout=True, figsize=(50,50))
fig.autofmt_xdate()

for i, ax in enumerate(axarr):
    ax.plot(time['date'], graph[graph.columns[i]].values)
    ax.set(ylabel=graph.columns[i])
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    myFmt = mdates.DateFormatter('%d.%m.%Y %H:%M')
    ax.xaxis.set_major_formatter(myFmt)
    ax.label_outer()

print('--save-fig--')
plt.savefig(name, dpi=500)
plt.close()

But this is so incredible slow, for 100 subplots it took ~1 min, for 2000 around 20 min. Well my machine has 10 cores and 35 gb ram actually. Have you any hints for me to speed up the process? Is it possible to do multithreading? As i can see this only use one core. Are there some tricks to only draw relevant things? Or is there an alternative method to draw this plot faster, all in one figure without subplots?

lkaupp · Accepted Answer

Thanks to @Asmus, i came up with this solution, brought me down from 20 mins to 40 secs for (2000,2000). I did not find any good well-documented solution for beginners like me, so i post mine here, used for timeseries and a huge number of columns:

def print_image_fast(name="default.png", graph=[]):
    int_columns = len(graph.columns)
    #enlarge our figure for every 1000 columns by 30 inch, function well with 500 dpi labelsize 2 and linewidth 0.1
    y_size = (int_columns / 1000) * 30
    fig = plt.figure(figsize=(10, y_size))
    ax = fig.add_subplot(1, 1, 1)
    #set_time_formatter for timeseries
    myFmt = mdates.DateFormatter('%d.%m.%Y %H:%M')
    ax.xaxis.set_major_formatter(myFmt)
    #store the label offsets
    y_label_offsets = []
    current = 0
    for i, col in enumerate(graph.columns):
        #last max height of the column before
        last = current
        #current max value of the column and therefore the max height on y
        current = np.amax(graph[col].values)


        if i == 0:
            #y_offset to move the graph along the y axis, starting with column 0 the offset is 0
            y_offset = 0
        else:
            #add the last y_offset (aggregated y_offset from the columns before) + the last offset + 1 is our new Y - zero point to start drawing the new graph
            y_offset = y_offset + last + 1

        #our label offset is always our current y_offset + half of our height (half of current max value)
        y_offset_label = y_offset + (current / 2)
        #append label position to array
        y_label_offsets.append(y_offset_label)
        #plot our graph according to our offset
        ax.plot(graph.index.values, graph[col].values + y_offset,
                'r-o', ms=0.1, mew=0, mfc='r', linewidth=0.1)

    #set boundaries of our chart, last y_offset + full current is our limit for our y-value
    ax.set_ylim([0, y_offset+current])
    #set boundaries for our timeseries, first and last value
    ax.set_xlim([graph.index.values[0], graph.index.values[-1]])

    #print columns with computed positions to y axis
    plt.yticks(y_label_offsets, graph.columns, fontsize=2)
    #print our timelabels on x axis
    plt.xticks(fontsize=15, rotation=90)

    plt.savefig(name, dpi=500)
    plt.close()

//Edit: For anybody interested, a dataframe with (20k,20k) polutes my ram with around ~20gb. And i had to change savefig to svg, because Agg can't handle dimensions greater than 2^16 pixels

How to speed up matplotlib, subplot plotting/drawing and saving?

Answers (1)

Related Questions