Reputation: 3677
I have a dataframe like this. We will call it df1:
zone script datetimecomplete
0 2.0 Consumers 2018-05-03 16:09:48
1 2.0 Inventory 2018-05-03 16:09:48
2 2.0 Orders_Details 2018-05-03 16:09:48
3 3.0 Consumers 2018-05-03 16:09:48
4 3.0 Inventory 2018-05-03 16:09:48
5 3.0 Orders_Details 2018-05-03 16:09:48
6 6.0 Birthday 2018-05-03 16:09:48
I have to create another dataframe (df2) based on df1. The requirement is that for each 'zone' and 'script'
columns I need to generate the count of script in the last 24 hours(only need to compare the date not timestamps). Only need to append rows where datetimecomplete
column is Today minus 1 day.
df2 needs to have 4 columns:
[zone, script, datetimecomplete, totalcount]
zone - same as df1 script - same as df1 datetimecomplete - current date minus 1 day needs to appended into df2 totalcount - needs to be count of each individual script in the last 24hours from todays date.
Example of df2:
zone script datetimecomplete totalcount
2.0 Consumers 2018-05-02 16:09:48 2
2.0 Inventory 2018-05-02 16:09:48 1
2.0 Orders_Details 2018-05-02 16:09:48 1
3.0 Consumers 2018-05-02 16:09:48 1
3.0 Inventory 2018-05-02 16:09:48 1
3.0 Orders_Details 2018-05-02 16:09:48 4
6.0 Birthday 2018-05-02 16:09:48 6
I am not sure how to do this.
Code so far, gets me the count but not for the last 24hours from df1 nor does it create the totalcount
column:
df2 = df1.groupby(['zone', 'script', df1['datetimecomplete'] - pd.Timedelta(days=1)])['script'].count()
sample SQL query I am trying to accomplish:
select df1.zone, df1.script, currentdate - 1 as [datetimecomplete], count(df1.zone) as [TotalCount]
from [df1] as a
where df1.datetimecomplete = currentdate - 1
group by df1.zone, df1.script
Thank you in advance.
Upvotes: 0
Views: 73
Reputation: 29635
So to perform your problem if I understand, you can create a mask_24h
to select the data from df1
in the last 24h:
h24_ago = (pd.Timestamp.today() - pd.Timedelta(days=1)).date() # time 24h ago
mask_24h = df1['datetimecomplete'] >= h24_ago
Then to create your df2
you are right to use groupby
but I would do this way:
df2 = df1[mask_24h ].groupby(['zone','script']).\
count().reset_index().rename(columns = {'datetimecomplete':'totalcount'})
here you select the data from df1
within the last 24h with df1[mask_date]
then you groupby()
on 'zone'
and 'script'
and proceed the count()
. reset_index()
allows to keep the other columns and rename()
to rename. Now you are just missing the column 'datetimecomplete'
so do:
df2['datetimecomplete'] = h24_ago
Hope it works
Upvotes: 1