Reputation: 1445
I have a pandas dataframe which looks as below
Date SKU Balance
0 1/1/2017 X1 8
1 1/1/2017 X2 45
2 1/1/2017 X1 47
3 1/1/2017 X2 16
4 2/1/2017 X1 14
5 2/1/2017 X2 67
6 2/1/2017 X2 9
8 2/1/2017 X1 66
9 2/1/2017 X1 158
My first goal is to generate multiple dataframe filtered by every single day
for which I coded
df_1stjan = df.query("Date == \"1/1/2017\"")
And I got the below result
Date SKU Balance
0 1/1/2017 X1 8
1 1/1/2017 X2 45
2 1/1/2017 X1 47
3 1/1/2017 X2 16
My second goal is to groupby SKU's and I coded
df_1stjan_uSKU = df_1stjan.groupby(['SKU','Date'], \
as_index=False).agg({'Balance':'sum'})
And I got the below result
Date SKU Balance
0 1/1/2017 X1 55
1 1/1/2017 X2 61
At the moment I could only code to generate df for only one date at a time a
But I need to write a function or loop to automate it for all the days of 2017.
Note the Date column has string dtype
Upvotes: 1
Views: 68
Reputation: 51155
I think you are making this too complicated on yourself. You have pretty much solved your own problem, but I would recommend doing your indexing after the initial groupby
and agg
.
Sample dataframe:
Balance Date SKU
0 8 1/1/2017 X1
1 45 1/1/2017 X2
2 47 1/1/2017 X1
3 16 1/1/2017 X2
4 22 1/2/2017 X3
5 24 1/2/2017 X3
6 25 1/3/2017 X4
7 3 1/3/2017 X4
groupby
with agg
df1 = df.groupby(['Date', 'SKU'], as_index=False).agg({'Balance':'sum'})
Date SKU Balance
0 1/1/2017 X1 55
1 1/1/2017 X2 61
2 1/2/2017 X3 46
3 1/3/2017 X4 28
to_datetime
to convert Date
column
df1['Date'] = pd.to_datetime(df1.Date, format='%m/%d/%Y')
date_range
with all days you would like to access
dr = pd.date_range('20170101','20170103')
loc
with loop to access slice for each day
for d in dr:
print(df1.loc[df1.Date.isin([d])])
Date SKU Balance
0 2017-01-01 X1 55
1 2017-01-01 X2 61
Date SKU Balance
2 2017-01-02 X3 46
Date SKU Balance
3 2017-01-03 X4 28
Upvotes: 2
Reputation: 29635
If you do first
df_group = df.groupby(['Date', 'C1', 'C2', 'C3', 'SKU']).sum()
Then you can create your dfs such as:
for date in set(df['Date']):
df_date = df_group.loc[date].reset_index()
# and do whatever with df_date, you can save them in a list for example
# to access them later but maybe the df_group.loc[date].reset_index() is enough for what you need
Upvotes: 0