Reputation: 21
I have a dataset that lists employees and timestamps on which they complete certain actions. It is divided into three columns: Employee, Date, Hour.
I want to count the number of employees that is active each hour. In excel I would do this by adding a fourth column EmpFactor
in which I perform a COUNTIFS
operation:
=1/COUNTIFS(Name range;Name;Date range;Date;Hour range;Hour)
I can subsequently calculate the number of active employees by performing a SUMIF
on the EmpFactor
Column.
I tried the following code to compose the EmpFactor
column using pandas:
for name,date,time in zip(df['Employee'],df['Date'],df['Time']):
df['EmpFactor'] = 1/(df[(df.Employee == name) and (df.Day == dag)
and (df.Time == tijd)].count())
This however doesn't work. I have searched extensively through numerous topics on SO, but haven't found a fitting answer yet.
Upvotes: 2
Views: 10157
Reputation: 85442
Starting with this dataframe:
df = pd.DataFrame({'Employee': list('ABCDEFGH'),
'Date': [1, 1, 1, 2, 2, 2, 3, 3],
'Time': [10, 10, 10, 11, 10, 11, 11, 12]})
print(df)
Output:
Date Employee Time
0 1 A 10
1 1 B 10
2 1 C 10
3 2 D 11
4 2 E 10
5 2 F 11
6 3 G 11
7 3 H 12
You can group by Date
and Time
and count the employees:
per_hour = df.groupby(['Date', 'Time']).count()
per_hour['EmpFactor'] = 1 / per_hour.Employee
print(per_hour)
Output:
Employee EmpFactor
Date Time
1 10 3 0.333333
2 10 1 1.000000
11 2 0.500000
3 11 1 1.000000
12 1 1.000000
Upvotes: 2
Reputation: 12845
Suppose you have such structure of DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame([['Alice', '2012-03-05', 23],
['Fred', '2012-03-05', 23],
['Bob', '2012-12-12', 00]],
columns=('Employee', 'Date', 'Time'))
# Here you have:
Employee Date Time
0 Alice 2012-03-05 23
1 Fred 2012-03-05 23
2 Bob 2012-12-12 0
# convert to a date
df['DateTime']=pd.to_datetime(df['Date'])
# make it index
df2=df.set_index('DateTime')
# group by date and time
g = df2.groupby([pd.TimeGrouper('D'), 'Time'])
# get counts:
print(g.count())
#Here you have:
Employee Date
DateTime Time
2012-03-05 23 2 2
2012-12-12 0 1 1
# to get inverted values:
print(1/g.count())
Employee Date
DateTime Time
2012-03-05 23 0.5 0.5
2012-12-12 0 1.0 1.0
Of course, better to make Time
part of the DateTime
column. You may practice on it if you wish :)
This approach is quite fast: it took about 3 mininute for grouping 47M rows on my laptop.
Upvotes: 1