bicarlsen
bicarlsen

Reputation: 1371

Slow matplotlib Plotting

I have MultiIndexed pandas Series and am trying to plot each index in its own subplot, but it is running very slowly.

To accomplish the subplotting I am using a for loop over the outer level of MultiIndex, and plotting the Series using the inner index level as the x coordinate.

def plot_series( data ):
    # create 16 subplots, corresponding to the 16 outer index levels
    fig, axs = plt.subplots( 4, 4 )

    for oi in data.index.get_level_values( 'outer_index' ):
        # calculate subplot to use
        row = int( oi/ 4 )
        col = int( oi - row* 4 )

        ax = axs[ row, col ]
        data.xs( oi ).plot( use_index = True, ax = ax )

    plt.show()

Each outer index level has 1000 data points, but the plotting takes several minutes to complete.

Is there a way to speed up the plotting?

Data

num_out = 16
num_in  = 1000

data = pd.Series( 
    data = np.random.rand( num_out* num_in ), 
    index = pd.MultiIndex.from_product( [ np.arange( num_out ), np.arange( num_in ) ], names = [ 'outer_index', 'inner_index' ] ) 
)

Upvotes: 0

Views: 175

Answers (1)

dubbbdan
dubbbdan

Reputation: 2730

Rather than loop through data.index.get_level_values( 'outer_index' ), you could use data.groupby(level='outer_index') and iterate through the grouped object using:

for name, group in grouped:
   #do stuff 

This removes the bottleneck that slicing the data frame using data.xs( oi ) creates.

def plot_series(data):
   grouped = data.groupby(level='outer_index')

   fig, axs = plt.subplots( 4, 4 )
   for name, group in grouped:
      row = int( name/ 4 )
      col = int( name - row* 4 )
      ax = axs[ row, col ]
      group.plot( use_index = True, ax = ax )

      plt.show()



num_out = 16
num_in  = 1000

data = pd.Series( 
    data = np.random.rand( num_out* num_in ), 
    index = pd.MultiIndex.from_product( [ np.arange( num_out ), np.arange( num_in ) ], names = [ 'outer_index', 'inner_index' ] ) 
)

plot_series(data)

using timeit you can see this approach is much faster:

%timeit plot_series(data)
795 ms ± 252 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 2

Related Questions