Reputation: 345
I am trying to group a Pandas Dataframe into buckets of 2 days. For example, if I do the below:
df = pd.DataFrame()
df['action_date'] = ['2017-01-01', '2017-01-01', '2017-01-03', '2017-01-04', '2017-01-04', '2017-01-05', '2017-01-06']
df['action_date'] = pd.to_datetime(df['action_date'], format="%Y-%m-%d")
df['user_name'] = ['abc', 'wdt', 'sdf', 'dfe', 'dsd', 'erw', 'fds']
df['number_of_apples'] = [1,2,3,4,5,6,2]
df = df.groupby(['action_date', 'number_of_apples']).sum()
I get a dataframe grouped by action_date with number_of_apples per day.
However, if I wanted to look at the dataframe in chunks of 2 days, how could I do so? I would then like to analyze the number_of_apples per date_chunk, either by making new dataframes for the dates 2017-01-01 & 2017-01-03, another for 2017-01-04 & 2017-01-05, and then one last one for 2017-01-06, OR just by regrouping and working within.
EDIT: I ultimately would like to make lists of users based on the the number of apples they have for each day chunk, so do not want to get the sum nor mean of each day chunk's apples. Sorry for the confusion!
Thank you in advance!
Upvotes: 1
Views: 698
Reputation: 4652
Try using a TimeGrouper
to group by two days.
>>df.index=df.action_date
>>dg = df.groupby(pd.TimeGrouper(freq='2D'))['user_name'].apply(list) # 2 day frequency
>>dg.head()
action_date
2017-01-01 [abc, wdt]
2017-01-03 [sdf, dfe, dsd]
2017-01-05 [erw, fds]
Upvotes: 1
Reputation: 862511
You can use resample
:
print (df.resample('2D', on='action_date')['number_of_apples'].sum().reset_index())
action_date number_of_apples
0 2017-01-01 3
1 2017-01-03 12
2 2017-01-05 8
EDIT:
print (df.resample('2D', on='action_date')['user_name'].apply(list).reset_index())
action_date user_name
0 2017-01-01 [abc, wdt]
1 2017-01-03 [sdf, dfe, dsd]
2 2017-01-05 [erw, fds]
Upvotes: 1