tmasters
tmasters

Reputation: 37

Can't find efficient way to replicate original code after refactoring of pandas .resample()

I am passing a dictionary of pandas DataFrames (or a pandas panel) into the function below in order convert from daily to monthly data. Each DataFrame represents a field (eg Open, High, Low or Close) in datetime v stock code space. The function works fine but I am getting deprecation warnings. I can't find an efficient implementation using the newly refactored resample() though. Most of the examples I can find use the agg() function to apply different methods to different columns of a single dataframe. My panel has a separate frame for each field though so this doesn't quite fit. I've tried using apply(lambda) and it works but is unreasonably slow. I'm sure there is an efficient implementation for this. I've noticed several questions that have been answered based on the deprecated implementation and a similar question to mine that has not yet been answered.

Here's my original function:

# function to convert daily data to monthly and return dictionary or panel
def to_monthly(fields, data_d, create_panel=True):

    how_dict={'Open':'first', 'High':'max', 'Low':'min', 'Close':'last'}

    data_m={}
    for field in fields:
        data_m[field]=data_d[field].resample(rule='M', how=how_dict[field]).ffill()
    if create_panel:
        data_m = pd.Panel(data_m)

    return data_m

This runs fine but I get the deprecation warnings. My attempt to solve this is:

# alternative function to handle refactoring of .resample()
def to_monthly(fields, data_d, create_panel=True):

    how_dict={
        'Open': (lambda x: x[0]),
        'High': (lambda x: x.max()),
        'Low': (lambda x: x.min()),
        'Close': (lambda x: x[-1]),
        'Volume': lambda x: x.sum()
        }

    data_m={}
    for field in fields:
        data_m[field]=data_d[field].resample('M').apply(how_dict[field]).ffill()
    if create_panel:
        data_m = pd.Panel(data_m)

    return data_m

I haven't found it easy to locate a replacement syntax for all the old "how" options. Some assistance on this would also be appreciated. The Pandas documentation doesn't always seem to provide all the options under a given field or usage. I've seen others have had similar problems.

Any help would be greatly appreciated

Thank you

Upvotes: 1

Views: 92

Answers (1)

Nickil Maveli
Nickil Maveli

Reputation: 29711

You can use Resampler.aggregate and pass a dict of the Column names as keys with it's respective intended operation as the values.

dict_ohlcv = {'Open':'first', 'High':'max', 'Low':'min', 'Close':'last', 'Volume':'sum'}
data_m[field]=data_d[field].resample('M')
                           .agg(dict_ohlcv[field])
                           .ffill()

The deprecation warnings you get is due to the fact that API breaking change to the .resample method to make it more .groupby like.[source: Resample API]

Upvotes: 0

Related Questions