Reputation: 637
I have a dataframe with the following columns:
DATE ALFA BETA
2016-04-26 1 3
2016-04-27 3 0
2016-04-28 0 8
2016-04-29 4 2
2016-04-30 3 1
2016-05-01 -2 -5
2016-05-02 3 0
2016-05-03 3 3
2016-05-08 1 7
2016-05-11 3 1
2016-05-12 10 1
2016-05-13 4 2
I would like to group the data in a weekly range but treat the alpha and beta columns differently. I would like to calculate the sum of the numbers in the ALFA
column for each week while for the BETA
column I would like to calculate the difference between the beginning and the end of the week. I show you an example of the expected result.
DATE sum_ALFA diff_BETA
2016-04-26 12 3
2016-05-03 4 4
2016-05-11 17 1
I have tried this code but it calculates the sum for each column
df = df.resample('W', on='DATE').sum().reset_index().sort_values(by='DATE')
this is my dataset https://drive.google.com/uc?export=download&id=1fEqjINx9R5io7t_YxA9qShvNDxWRCUke
Upvotes: 2
Views: 1103
Reputation: 5437
I'd guess I'm having a different locale here (hence my week is different), you can do:
df.resample("W", on="DATE",closed="left", label="left"
).agg({"ALFA":"sum", "BETA": lambda g: g.iloc[0] - g.iloc[-1]})
ALFA BETA
DATE
2016-04-24 11 2
2016-05-01 4 -8
2016-05-08 18 5
I think there is a solution for your data with my approach. Define
def get_series_first_minus_last(s):
try:
return s.iloc[0] - s.iloc[-1]
except IndexError:
return 0
and replace the lambda call just by the function call, i.e.
df.resample("W", on="DATE",closed="left", label="left"
).agg({"ALFA":"sum", "BETA": get_series_first_minus_last})
Note that in the newly defined function, you could also return nan if you'd prefer that.
Upvotes: 2