Phil Collins
Phil Collins

Reputation: 327

Filter a 2 columns of a DataFrame using a dictionary containing lists Python

I have a pandas dataframe of employees that I need to filter based on 2 columns. I need to filter on department and level. So let's say we have department 'Human Resources' and within that it has level 1,2,3,4,5. I'm specifically looking for Human Resources level 2,4 and 5.

I have my desired departments and levels stored in dictionary, for example:

departments = dict({'Human Resources' : ['2','4','5'] ,'IT' : ['1','3','5','6'], etc.... })

My dataframe will list every employee, for all departments and for all levels (plus lots more). I now want to filter that dataframe using the dictionary above. So in the Human Resources example, I just want returned the employees who are in 'Human Resouces' and are in levels 2, 4 and 5.

An example of the df would be:

employee_ID   Department        Level
        001   Human Resources   1
        002   Human Resources   1
        003   Human Resources   2
        004   Human Resources   3
        005   Human Resources   4
        006   Human Resources   4
        007   Human Resources   5
        008   IT                1
        009   IT                2
        010   IT                3
        011   IT                4
        012   IT                5
        013   IT                6

Using the dictionary I've displayed above, my expected result would be

employee_ID   Department        Level
        003   Human Resources   2
        005   Human Resources   4
        006   Human Resources   4
        007   Human Resources   5
        008   IT                1
        010   IT                3
        012   IT                5
        013   IT                6

I have no idea how I'd do this?

Upvotes: 3

Views: 158

Answers (1)

Ben.T
Ben.T

Reputation: 29635

you can use groupby on Departement and use isin on the Level and get the value for the departement concerned with the name of the group.

#example data
departments = dict({'Human Resources' : ['2','4','5'] ,'IT' : ['1','3','5','6']})
df = pd.DataFrame({'Id':range(10), 
                   'Departement': ['Human Resources']*5+['IT']*5, 
                   'Level':list(range(1,6))*2})
#filter
print (df[df.groupby('Departement')['Level']
            .apply(lambda x: x.isin(departments[x.name]))])
   Id      Departement  Level
1   1  Human Resources      2
3   3  Human Resources      4
4   4  Human Resources      5
5   5               IT      1
7   7               IT      3
9   9               IT      5

Upvotes: 2

Related Questions