Andre_k
Andre_k

Reputation: 1730

Pivoting Data in python

I'm calculating attendance of employees, here is the sample table

df = pd.DataFrame({
    'E_ID': [1001, 1001, 1001, 1002, 1002, 1002, 1002],
    'Date': [
        '28-07-2019 08:27:00',
        '28-07-2019 18:10:00',
        '29-07-2019 08:10:00',
        '28-07-2019 08:07:00',
        '29-07-2019 08:10:10',
        '29-07-2019 08:10:17',
        '29-07-2019 17:50:00'
     ]
})

I'm trying to calculate In-Time , Out-Time per day from the Date column of df.

Also there could be single entry per day which can be treated as Punch In time.

In-Time would be first punch in AND Out-Time would be last punch out.

And I'm expecting output something like this ,or may be something similar

E_ID    OfficePunch   In Punch           Out Punch
1001    28-08-2019    28-07-2019 08:27   28-07-2019 18:10
1001    29-08-2019    29-07-2019 08:10   29-07-2019 08:10
1002    28-08-2019    28-07-2019 08:07   28-07-2019 08:07
1002    29-08-2019    29-07-2019 08:10   29-07-2019 17:50

Is there any way by which it can be done?

Upvotes: 3

Views: 82

Answers (1)

jezrael
jezrael

Reputation: 862661

For python object dates in output OfficePunch column use Series.dt.date with aggregate by GroupBy.agg first and last value:

df['Date'] = pd.to_datetime(df['Date'])

df1 = (df.groupby(['E_ID', df['Date'].dt.date.rename('OfficePunch')])['Date']
        .agg([('In Punch','first'),('Out Punch','last')])
        .reset_index())
print (df1)
   E_ID OfficePunch            In Punch           Out Punch
0  1001  2019-07-28 2019-07-28 08:27:00 2019-07-28 18:10:00
1  1001  2019-07-29 2019-07-29 08:10:00 2019-07-29 08:10:00
2  1002  2019-07-28 2019-07-28 08:07:00 2019-07-28 08:07:00
3  1002  2019-07-29 2019-07-29 08:10:10 2019-07-29 17:50:00

print (df1.dtypes)
E_ID                    int64
OfficePunch            object
In Punch       datetime64[ns]
Out Punch      datetime64[ns]
dtype: object

If need datetimes in OfficePunch column use Series.dt.floor:

df['Date'] = pd.to_datetime(df['Date'])

df1 = (df.groupby(['E_ID', df['Date'].dt.floor('d').rename('OfficePunch')])['Date']
        .agg([('In Punch','first'),('Out Punch','last')])
        .reset_index())
print (df1)
   E_ID OfficePunch            In Punch           Out Punch
0  1001  2019-07-28 2019-07-28 08:27:00 2019-07-28 18:10:00
1  1001  2019-07-29 2019-07-29 08:10:00 2019-07-29 08:10:00
2  1002  2019-07-28 2019-07-28 08:07:00 2019-07-28 08:07:00
3  1002  2019-07-29 2019-07-29 08:10:10 2019-07-29 17:50:00

print (df1.dtypes)
E_ID                    int64
OfficePunch    datetime64[ns]
In Punch       datetime64[ns]
Out Punch      datetime64[ns]
dtype: object

Upvotes: 7

Related Questions