How to get rid of headers and index from the Python list?

Question

I am basically trying to create count of Employees present in office at any point in time from below DataFrame:

df = pd.DataFrame({
                    'Emp_id': [121,321,451,121,
                               451,451,321,755],
                    'TimeStamp': ['29-07-2019 08:55:55','29-07-2019 09:02:55','29-07-2019 09:05:50','29-07-2019 10:05:50',
                                  '29-07-2019 10:07:50','29-07-2019 10:10:10','29-07-2019 10:20:10','29-07-2019 11:00:00'],
                    'Status': ['IN','IN','IN','OUT','OUT','IN','OUT','IN']
})
df

    Emp_id  TimeStamp           Status
0   121     29-07-2019 08:55:55 IN
1   321     29-07-2019 09:02:55 IN
2   451     29-07-2019 09:05:50 IN
3   121     29-07-2019 10:05:50 OUT
4   451     29-07-2019 10:07:50 OUT
5   451     29-07-2019 10:10:10 IN
6   321     29-07-2019 10:20:10 OUT
7   755     29-07-2019 11:00:00 IN

My approach to solve this: group each "Emp_id" and find its max(TimeStamp) entry. From that if I can filter in the 'IN' Status and then get count of all those will be the number of Employees still sitting in office.

time_entry = df[['Emp_id','TimeStamp']].groupby(['Emp_id']).max()['TimeStamp']

Final_Status = []

for i in time_entry:
    Final_Status.append(df.loc[df['TimeStamp'] == i, ['Status']])

Final_Status

>> output
[  Status
3    OUT,   Status
6    OUT,   Status
5     IN,   Status
7     IN]

Problem:

I am getting a weird "Final_Status" List. It has header and index with the list values

[ Status 3 OUT, Status 6 OUT, Status 5 IN, Status 7 IN]

And this is not letting me Count "IN" from the list i.e. Final_Status.count('IN') is not working.

Any other smarter approach to solve this problem is also appreciated.

Alex · Accepted Answer

import pandas as pd

df = pd.DataFrame({
                    'Emp_id': [121,321,451,121,
                               451,451,321,755],
                    'TimeStamp': ['29-07-2019 08:55:55','29-07-2019 09:02:55','29-07-2019 09:05:50','29-07-2019 10:05:50',
                                  '29-07-2019 10:07:50','29-07-2019 10:10:10','29-07-2019 10:20:10','29-07-2019 11:00:00'],
                    'Status': ['IN','IN','IN','OUT','OUT','IN','OUT','IN']
})

#Convert strings to pandas datetime object
df['TimeStamp'] = pd.to_datetime(df['TimeStamp'])

#Group by employee id
groups = df.groupby('Emp_id')

Final_Status = []
for name, group in groups:
    status = group.loc[group.TimeStamp.argmax(), 'Status']
    Final_Status.append(status)

How to get rid of headers and index from the Python list?

Problem:

Answers (2)

Related Questions