xtian
xtian

Reputation: 2947

How do I work with Pandas multi-level series?

I'm using Pandas filter() and groupby() to arrive at a count of employee types on a given day. I need to analyze this data for outliers. In this toy example, the outlier is any shift & day where there are not two (2) shifts.

shifts = [("Cashier", "Thursday"), ("Cashier", "Thursday"),
        ("Cashier", "Thursday"), ("Cook", "Thursday"),
        ("Cashier", "Friday"), ("Cashier", "Friday"),
        ("Cook", "Friday"), ("Cook", "Friday"),
        ("Cashier", "Saturday"), ("Cook", "Saturday"),
        ("Cook", "Saturday")]
labels = ["JOB_TITLE", "DAY"]
df = pd.DataFrame.from_records(shifts, columns=labels)
df1 = df[['JOB_TITLE', 'DAY']].groupby('DAY')
shifts_series = df1['JOB_TITLE'].value_counts()

The results:

DAY       JOB_TITLE
Friday    Cashier      2
          Cook         2
Saturday  Cook         2
          Cashier      1
Thursday  Cashier      3
          Cook         1
Name: JOB_TITLE, dtype: int64

In this example, the data I need as output is,

I'm thinking the better question is... How can I work with Multi-index Series to generate the expected list of tuples?

Which I can then use to get to my end goal, ultimately, to add two new columns to the DataFrame Worked a Double: (True,False) and Over staffed: (True,False).

Upvotes: 0

Views: 46

Answers (1)

Ynjxsjmh
Ynjxsjmh

Reputation: 30070

Let's boolean index the series and convert index to list

filter1 = shifts_series[shifts_series == 1].index.tolist()
filter2 = shifts_series[shifts_series > 2].index.tolist()
$ print(filter1)
[('Saturday', 'Cashier'), ('Thursday', 'Cook')]

$ print(filter2)
[('Thursday', 'Cashier')]

Upvotes: 2

Related Questions