Reputation: 155
I am basically trying to create count of Employees present in office at any point in time from below DataFrame:
df = pd.DataFrame({
'Emp_id': [121,321,451,121,
451,451,321,755],
'TimeStamp': ['29-07-2019 08:55:55','29-07-2019 09:02:55','29-07-2019 09:05:50','29-07-2019 10:05:50',
'29-07-2019 10:07:50','29-07-2019 10:10:10','29-07-2019 10:20:10','29-07-2019 11:00:00'],
'Status': ['IN','IN','IN','OUT','OUT','IN','OUT','IN']
})
df
Emp_id TimeStamp Status
0 121 29-07-2019 08:55:55 IN
1 321 29-07-2019 09:02:55 IN
2 451 29-07-2019 09:05:50 IN
3 121 29-07-2019 10:05:50 OUT
4 451 29-07-2019 10:07:50 OUT
5 451 29-07-2019 10:10:10 IN
6 321 29-07-2019 10:20:10 OUT
7 755 29-07-2019 11:00:00 IN
My approach to solve this: group each "Emp_id" and find its max(TimeStamp) entry. From that if I can filter in the 'IN' Status and then get count of all those will be the number of Employees still sitting in office.
time_entry = df[['Emp_id','TimeStamp']].groupby(['Emp_id']).max()['TimeStamp']
Final_Status = []
for i in time_entry:
Final_Status.append(df.loc[df['TimeStamp'] == i, ['Status']])
Final_Status
>> output
[ Status
3 OUT, Status
6 OUT, Status
5 IN, Status
7 IN]
I am getting a weird "Final_Status" List. It has header and index with the list values
[ Status 3 OUT, Status 6 OUT, Status 5 IN, Status 7 IN]
And this is not letting me Count "IN" from the list i.e.
Final_Status.count('IN')
is not working.
Any other smarter approach to solve this problem is also appreciated.
Upvotes: 3
Views: 497
Reputation: 961
import pandas as pd
df = pd.DataFrame({
'Emp_id': [121,321,451,121,
451,451,321,755],
'TimeStamp': ['29-07-2019 08:55:55','29-07-2019 09:02:55','29-07-2019 09:05:50','29-07-2019 10:05:50',
'29-07-2019 10:07:50','29-07-2019 10:10:10','29-07-2019 10:20:10','29-07-2019 11:00:00'],
'Status': ['IN','IN','IN','OUT','OUT','IN','OUT','IN']
})
#Convert strings to pandas datetime object
df['TimeStamp'] = pd.to_datetime(df['TimeStamp'])
#Group by employee id
groups = df.groupby('Emp_id')
Final_Status = []
for name, group in groups:
status = group.loc[group.TimeStamp.argmax(), 'Status']
Final_Status.append(status)
Upvotes: 2
Reputation: 620
I just performed some modification to your code to make the count work.
I added in the groupby function the argument as_index = False because we don't need index to get the last maximum time for each user.
For the creation of the Final_list, First I select the line and column as you did. Then take the values with the argument values (get rid of headers). Then I transform it into a list (list with only 1 string 'In' or 'Out' and take the first elements who will be the needed string
The two last lines are just here to print results
this is the final script. (This is not the best way but it will work)
import pandas as pd
df = pd.DataFrame({
'Emp_id': [121,321,451,121,
451,451,321,755],
'TimeStamp': ['29-07-2019 08:55:55','29-07-2019 09:02:55',
'29-07-2019 09:05:50','29-07-2019 10:05:50',
'29-07-2019 10:07:50','29-07-2019 10:10:10',
'29-07-2019 10:20:10','29-07-2019 11:00:00'],
'Status': ['IN','IN','IN','OUT','OUT','IN','OUT','IN']
})
print(df)
# OUTPUT
# Emp_id TimeStamp Status
#0 121 29-07-2019 08:55:55 IN
#1 321 29-07-2019 09:02:55 IN
#2 451 29-07-2019 09:05:50 IN
#3 121 29-07-2019 10:05:50 OUT
#4 451 29-07-2019 10:07:50 OUT
#5 451 29-07-2019 10:10:10 IN
#6 321 29-07-2019 10:20:10 OUT
#7 755 29-07-2019 11:00:00 IN
time_entry = df[['Emp_id',
'TimeStamp']].groupby(['Emp_id'],
as_index=False).max()['TimeStamp']
Final_Status = []
for i in time_entry:
Final_Status.append(df.loc[df['TimeStamp'] == i]['Status'].values.tolist()[0])
print(Final_Status)
#OUTPUT
# ['OUT', 'OUT', 'IN', 'IN']
print(Final_Status.count('IN'))
#OUTPUT
# 2
Upvotes: 1