nenno lillo
nenno lillo

Reputation: 637

python and dataframe: group by week and calculate the sum and difference

I have a dataframe with the following columns:

      DATE      ALFA    BETA
2016-04-26         1       3
2016-04-27         3       0
2016-04-28         0       8
2016-04-29         4       2
2016-04-30         3       1
2016-05-01        -2      -5
2016-05-02         3       0
2016-05-03         3       3
2016-05-08         1       7
2016-05-11         3       1
2016-05-12        10       1
2016-05-13         4       2

I would like to group the data in a weekly range but treat the alpha and beta columns differently. I would like to calculate the sum of the numbers in the ALFA column for each week while for the BETA column I would like to calculate the difference between the beginning and the end of the week. I show you an example of the expected result.

      DATE      sum_ALFA    diff_BETA
2016-04-26            12            3
2016-05-03             4            4
2016-05-11            17            1

I have tried this code but it calculates the sum for each column

df = df.resample('W', on='DATE').sum().reset_index().sort_values(by='DATE')

this is my dataset https://drive.google.com/uc?export=download&id=1fEqjINx9R5io7t_YxA9qShvNDxWRCUke

Upvotes: 2

Views: 1103

Answers (1)

Quickbeam2k1
Quickbeam2k1

Reputation: 5437

I'd guess I'm having a different locale here (hence my week is different), you can do:

df.resample("W", on="DATE",closed="left", label="left"
            ).agg({"ALFA":"sum", "BETA": lambda g: g.iloc[0] - g.iloc[-1]})
            ALFA  BETA
DATE
2016-04-24    11     2
2016-05-01     4    -8
2016-05-08    18     5

I think there is a solution for your data with my approach. Define

def get_series_first_minus_last(s):
    try:
        return s.iloc[0] - s.iloc[-1]
    except IndexError:
        return 0

and replace the lambda call just by the function call, i.e.

df.resample("W", on="DATE",closed="left", label="left"
            ).agg({"ALFA":"sum", "BETA": get_series_first_minus_last})

Note that in the newly defined function, you could also return nan if you'd prefer that.

Upvotes: 2

Related Questions