RustyShackleford
RustyShackleford

Reputation: 3677

How to generate new dataframe from SQL like clause?

I have a dataframe like this. We will call it df1:

  zone          script    datetimecomplete
0   2.0       Consumers 2018-05-03 16:09:48
1   2.0       Inventory 2018-05-03 16:09:48
2   2.0  Orders_Details 2018-05-03 16:09:48
3   3.0       Consumers 2018-05-03 16:09:48
4   3.0       Inventory 2018-05-03 16:09:48
5   3.0  Orders_Details 2018-05-03 16:09:48
6   6.0        Birthday 2018-05-03 16:09:48

I have to create another dataframe (df2) based on df1. The requirement is that for each 'zone' and 'script' columns I need to generate the count of script in the last 24 hours(only need to compare the date not timestamps). Only need to append rows where datetimecomplete column is Today minus 1 day.

df2 needs to have 4 columns:

[zone, script, datetimecomplete, totalcount]

zone - same as df1 script - same as df1 datetimecomplete - current date minus 1 day needs to appended into df2 totalcount - needs to be count of each individual script in the last 24hours from todays date.

Example of df2:

 zone          script    datetimecomplete    totalcount
  2.0       Consumers 2018-05-02 16:09:48     2
  2.0       Inventory 2018-05-02 16:09:48     1
  2.0  Orders_Details 2018-05-02 16:09:48     1
  3.0       Consumers 2018-05-02 16:09:48     1
  3.0       Inventory 2018-05-02 16:09:48     1
  3.0  Orders_Details 2018-05-02 16:09:48     4
  6.0        Birthday 2018-05-02 16:09:48     6

I am not sure how to do this.

Code so far, gets me the count but not for the last 24hours from df1 nor does it create the totalcount column:

df2 = df1.groupby(['zone', 'script', df1['datetimecomplete'] - pd.Timedelta(days=1)])['script'].count() 

sample SQL query I am trying to accomplish:

select df1.zone, df1.script, currentdate - 1 as [datetimecomplete], count(df1.zone) as [TotalCount]

from [df1] as a

where df1.datetimecomplete = currentdate - 1

group by df1.zone, df1.script

Thank you in advance.

Upvotes: 0

Views: 73

Answers (1)

Ben.T
Ben.T

Reputation: 29635

So to perform your problem if I understand, you can create a mask_24h to select the data from df1 in the last 24h:

h24_ago = (pd.Timestamp.today() - pd.Timedelta(days=1)).date() # time 24h ago
mask_24h = df1['datetimecomplete'] >= h24_ago

Then to create your df2 you are right to use groupby but I would do this way:

df2 = df1[mask_24h ].groupby(['zone','script']).\
         count().reset_index().rename(columns = {'datetimecomplete':'totalcount'})

here you select the data from df1 within the last 24h with df1[mask_date] then you groupby() on 'zone' and 'script' and proceed the count(). reset_index() allows to keep the other columns and rename() to rename. Now you are just missing the column 'datetimecomplete' so do:

df2['datetimecomplete'] = h24_ago

Hope it works

Upvotes: 1

Related Questions