Reputation: 2947
I'm using Pandas filter()
and groupby()
to arrive at a count of employee types on a given day. I need to analyze this data for outliers. In this toy example, the outlier is any shift & day where there are not two (2) shifts.
shifts = [("Cashier", "Thursday"), ("Cashier", "Thursday"),
("Cashier", "Thursday"), ("Cook", "Thursday"),
("Cashier", "Friday"), ("Cashier", "Friday"),
("Cook", "Friday"), ("Cook", "Friday"),
("Cashier", "Saturday"), ("Cook", "Saturday"),
("Cook", "Saturday")]
labels = ["JOB_TITLE", "DAY"]
df = pd.DataFrame.from_records(shifts, columns=labels)
df1 = df[['JOB_TITLE', 'DAY']].groupby('DAY')
shifts_series = df1['JOB_TITLE'].value_counts()
The results:
DAY JOB_TITLE
Friday Cashier 2
Cook 2
Saturday Cook 2
Cashier 1
Thursday Cashier 3
Cook 1
Name: JOB_TITLE, dtype: int64
In this example, the data I need as output is,
[('Saturday', 'Cashier'), ('Thursday', 'Cook')]
[('Thursday', 'Cashier')]
I'm thinking the better question is... How can I work with Multi-index Series to generate the expected list of tuples?
Which I can then use to get to my end goal, ultimately, to add two new columns to the DataFrame Worked a Double: (True,False)
and Over staffed: (True,False)
.
Upvotes: 0
Views: 46
Reputation: 30070
Let's boolean index the series and convert index to list
filter1 = shifts_series[shifts_series == 1].index.tolist()
filter2 = shifts_series[shifts_series > 2].index.tolist()
$ print(filter1)
[('Saturday', 'Cashier'), ('Thursday', 'Cook')]
$ print(filter2)
[('Thursday', 'Cashier')]
Upvotes: 2