Thomas
Thomas

Reputation: 12127

example of pandas resampling syntax

I have this:

columns_dict = {'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum'}
daily_data = hourly_data.resample('1D').agg(columns_dict)

and I get this warning:

FutureWarning: using a dict with renaming is deprecated and will be removed in a future version.

For column-specific groupby renaming, use named aggregation

>>> df.groupby(...).agg(name=('column', aggfunc))

but I don't quite understand the syntax; does it expect something like this:

daily_data = hourly_data.resample('1D').agg(name=('open', first)).agg(name=('high', max))

Edit:

Following the docs, pointed by BallpointBen (at https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#aggregation), I came up with the following:

    daily_data = hourly_data.resample('1D').agg(
        open=pd.NamedAgg(column='open', aggfunc='first'),
        high=pd.NamedAgg(column='high', aggfunc='max'),
        low=pd.NamedAgg(column='low', aggfunc='min'),
        close=pd.NamedAgg(column='close', aggfunc='last'),
        volume=pd.NamedAgg(column='volume', aggfunc='sum'))

but then this won't work:

aggregate() missing 1 required positional argument: 'func'

so, I went back to the docs, looking for the aggregation and found this: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html#pandas.DataFrame.agg

so I made the following:

    daily_data = hourly_data.resample('1D').agg(
        {
            'open': ['open', 'first'],
            'high': ['high', 'max'],
            'low': ['low', 'min'],
            'close': ['close', 'last'],
            'volume': ['volume', 'sum']
        }) 

but then it crashes with:

<class 'tuple'>: (<class 'KeyError'>, KeyError("Column 'open' does not exist!"), <traceback object at 0x110143820>)

so, I don't understand the error messages because it looks like they were written for people that already know pandas well and I certainly don't understand the documentation for the aggregation. The first part was the groupby documentation, not sure how much it applies and if aggregation in general has the same syntax, or not. But the groupby aggregation doc shows a different syntax than the page dedicated to aggregation...


edit:

this won't work either:

    daily_data = hourly_data.resample('1D').agg(
            ('open', 'first'),
            ('high', 'max'),
            ('low', 'min'),
            ('close', 'last'),
            ('volume', 'sum')
        )

nor does that:

   daily_data = hourly_data.resample('1D').agg(
            open=('open', 'first'),
            high=('high', 'max'),
            low=('low', 'min'),
            close=('close', 'last'),
            volume=('volume', 'sum')
        )

all the example I find on the web have the old syntax, including all the ones on SO.

the pandas doc has reasample examples here:

and aggregation samples there:

and none of them show an example of the new syntax either.

SIA recommended to pass name tuples, which makes sense.. but I can't find anywhere in the doc where the names of the field would be. and, when looking at the warning text:

df.groupby(...).agg(name=('column', aggfunc))

the whole tuple is named... still no clue what the name is supposed to represent.. a column name? some operation name? again, no clear documentation.

but then, looking at the test of the warning:

FutureWarning: using a dict with renaming is deprecated and will be removed in a future version.

For column-specific groupby renaming, use named aggregation

where does renaming come into play? I'm doing resampling, I don't want to rename anything, so I don't understand the message here either.

trying to be creative, I even went for that:

    daily_data = hourly_data.resample('1D').agg(
        [
            namedtuple('open', pd.NamedAgg(column='open', aggfunc='first')),
            namedtuple('high', pd.NamedAgg(column='high', aggfunc='max')),
            namedtuple('low', pd.NamedAgg(column='low', aggfunc='min')),
            namedtuple('close', pd.NamedAgg(column='close', aggfunc='last')),
            namedtuple('volume', pd.NamedAgg(column='volume', aggfunc='sum'))
        ]) 

with the same result.

Upvotes: 1

Views: 811

Answers (1)

user62009
user62009

Reputation: 1

This works:

daily_data = hourly_data.resample('1D').agg({
            'open': 'first',
            'high': 'max',
            'low': 'min',
            'close': 'last',
            'volume': 'sum',
        })
daily_data.columns = daily_data.map('_'.join)

you can also, in case, call multiple functions at each column:

def foo(x):
return x*100
daily_data = hourly_data.resample('1D').agg({
            'open': 'first',
            'high': ['max', 'mean', foo],
            'low': 'min',
            'close': 'last',
            'volume': 'sum',
        })

Upvotes: 0

Related Questions