Ravaal
Ravaal

Reputation: 3359

How do I speed up applying a function to a large pandas dataframe?

So I started yesterday on applying a function to a decent size dataset (6 million rows) but it's taking forever. I'm even trying to use pandarallel but that is not working well either. In any case, here is the code that I'm using...

def classifyForecast(dataframe):

    buckets = len(dataframe[dataframe['QUANTITY'] != 0])

    try:
        adi = dataframe.shape[0] / buckets
        cov = dataframe['QUANTITY'].std() / dataframe['QUANTITY'].mean()

        if adi < 1.32:
            if cov < .49:
                dataframe['TYPE'] = 'Smooth'
            else:
                dataframe['TYPE'] = 'Erratic'
        else:
            if cov < .49:
                dataframe['TYPE'] = 'Intermittent'
            else:
                dataframe['TYPE'] = 'Lumpy'

    except:
        dataframe['TYPE'] = 'Smooth'
    
    try:
        dataframe['ADI'] = adi
    except:
        dataframe['ADI'] = np.inf
    try:
        dataframe['COV'] = cov
    except:
        dataframe['COV'] = np.inf
    

    return dataframe

from pandarallel import pandarallel

pandarallel.initialize()

def quick_classification(df):
    return df.parallel_apply(classifyForecast(df))

Also, please note that I am splitting the dataframe up into batches. I don't want the function to work on each row, but instead I want it to work on the chunks. That way I can get the .mean() and .std() of specific columns.

It shouldn't take 48 hours to complete. How do I speed this up?

Upvotes: 1

Views: 300

Answers (1)

Wouter
Wouter

Reputation: 3261

It looks like mean and std are the only calculations here so I'm guessing that this is the bottleneck.

You could try speeding it up with numba.

from numba import njit
import numpy as np

@njit(parallel=True)
def numba_mean(x):
    return np.mean(x)

@njit(parallel=True)
def numba_std(x):
    return np.std(x)

cov = numba_std(dataframe['QUANTITY'].values) / numba_mean(dataframe['QUANTITY'].values)

Upvotes: 1

Related Questions