Vivi
Vivi

Reputation: 155

How to get rid of headers and index from the Python list?

I am basically trying to create count of Employees present in office at any point in time from below DataFrame:

df = pd.DataFrame({
                    'Emp_id': [121,321,451,121,
                               451,451,321,755],
                    'TimeStamp': ['29-07-2019 08:55:55','29-07-2019 09:02:55','29-07-2019 09:05:50','29-07-2019 10:05:50',
                                  '29-07-2019 10:07:50','29-07-2019 10:10:10','29-07-2019 10:20:10','29-07-2019 11:00:00'],
                    'Status': ['IN','IN','IN','OUT','OUT','IN','OUT','IN']
})
df

    Emp_id  TimeStamp           Status
0   121     29-07-2019 08:55:55 IN
1   321     29-07-2019 09:02:55 IN
2   451     29-07-2019 09:05:50 IN
3   121     29-07-2019 10:05:50 OUT
4   451     29-07-2019 10:07:50 OUT
5   451     29-07-2019 10:10:10 IN
6   321     29-07-2019 10:20:10 OUT
7   755     29-07-2019 11:00:00 IN

My approach to solve this: group each "Emp_id" and find its max(TimeStamp) entry. From that if I can filter in the 'IN' Status and then get count of all those will be the number of Employees still sitting in office.

time_entry = df[['Emp_id','TimeStamp']].groupby(['Emp_id']).max()['TimeStamp']

Final_Status = []

for i in time_entry:
    Final_Status.append(df.loc[df['TimeStamp'] == i, ['Status']])

Final_Status

>> output
[  Status
3    OUT,   Status
6    OUT,   Status
5     IN,   Status
7     IN]

Problem:

I am getting a weird "Final_Status" List. It has header and index with the list values

[ Status 3 OUT, Status 6 OUT, Status 5 IN, Status 7 IN]

And this is not letting me Count "IN" from the list i.e. Final_Status.count('IN') is not working.

Any other smarter approach to solve this problem is also appreciated.

Upvotes: 3

Views: 497

Answers (2)

Alex
Alex

Reputation: 961

import pandas as pd

df = pd.DataFrame({
                    'Emp_id': [121,321,451,121,
                               451,451,321,755],
                    'TimeStamp': ['29-07-2019 08:55:55','29-07-2019 09:02:55','29-07-2019 09:05:50','29-07-2019 10:05:50',
                                  '29-07-2019 10:07:50','29-07-2019 10:10:10','29-07-2019 10:20:10','29-07-2019 11:00:00'],
                    'Status': ['IN','IN','IN','OUT','OUT','IN','OUT','IN']
})

#Convert strings to pandas datetime object
df['TimeStamp'] = pd.to_datetime(df['TimeStamp'])

#Group by employee id
groups = df.groupby('Emp_id')

Final_Status = []
for name, group in groups:
    status = group.loc[group.TimeStamp.argmax(), 'Status']
    Final_Status.append(status)

Upvotes: 2

Skander HR
Skander HR

Reputation: 620

I just performed some modification to your code to make the count work.

I added in the groupby function the argument as_index = False because we don't need index to get the last maximum time for each user.

For the creation of the Final_list, First I select the line and column as you did. Then take the values with the argument values (get rid of headers). Then I transform it into a list (list with only 1 string 'In' or 'Out' and take the first elements who will be the needed string

The two last lines are just here to print results

this is the final script. (This is not the best way but it will work)

import pandas as pd


df = pd.DataFrame({
                    'Emp_id': [121,321,451,121,
                               451,451,321,755],
                    'TimeStamp': ['29-07-2019 08:55:55','29-07-2019 09:02:55',
                                  '29-07-2019 09:05:50','29-07-2019 10:05:50',
                                  '29-07-2019 10:07:50','29-07-2019 10:10:10',
                                  '29-07-2019 10:20:10','29-07-2019 11:00:00'],
                    'Status': ['IN','IN','IN','OUT','OUT','IN','OUT','IN']
})

print(df)

# OUTPUT
#   Emp_id            TimeStamp Status
#0     121  29-07-2019 08:55:55     IN
#1     321  29-07-2019 09:02:55     IN
#2     451  29-07-2019 09:05:50     IN
#3     121  29-07-2019 10:05:50    OUT
#4     451  29-07-2019 10:07:50    OUT
#5     451  29-07-2019 10:10:10     IN
#6     321  29-07-2019 10:20:10    OUT
#7     755  29-07-2019 11:00:00     IN

time_entry = df[['Emp_id',
                 'TimeStamp']].groupby(['Emp_id'],
                 as_index=False).max()['TimeStamp']

Final_Status = []

for i in time_entry:
    Final_Status.append(df.loc[df['TimeStamp'] == i]['Status'].values.tolist()[0])

print(Final_Status)
#OUTPUT
# ['OUT', 'OUT', 'IN', 'IN']
print(Final_Status.count('IN'))
#OUTPUT
# 2

Upvotes: 1

Related Questions