Tad
Tad

Reputation: 913

Pandas - compare 2 columns and choose value based on Priority

Below is my input dataframe

df = pd.DataFrame({'Level_DB': ['Level 1 Experienced' ,'Level 2 Expert', 'Level 1 Experienced', 'Level 2 Expert', 'Level 3 Thought Leader', 'Level 1 Experienced', 'Non-Certified', 'Level 3 Thought Leader', 'Certified', 'Certified', np.nan, 'Level 1 Experienced'], 
                    'Level_Legacy' :[ 'Certified', 'Level 1 Experienced', 'Level 3 Thought Leader', 'Level 3 Thought Leader Recert', 'Level 3 Thought Leader Recert', 'Non-Certified', 'non-certified', 'Level 2 Expert Recert', 'Level 1 Experienced', 'Non-Certified', 'Certified', '']})

And, the target column:'Output' should be produced after comparing both input columns 'Level_DB' and 'Level_Legacy' and choose the highest priority value. The priority List is as below

priority_List = ['Level 3 Thought Leader', 'Level 3 Thought Leader New', 'Level 3 Thought Leader Recert', 'Level 3 Thought Leader Recert Lapsed',
                 'Level 2 Expert', 'Level 2 Expert New', 'Level 2 Expert Recert', 'Level 2 Expert Recert Lapsed',
                 'Level 1 Experienced', 'Level 1 Experienced New', 'Level 1 Experienced Recert', 'Level 1 Experienced Recert Lapsed', 'Certified', 'Non-Certified' , 'non-certified']

Expected final DataFrame with desired 'Output' column as below

enter image description here

Nothing strikes my brain to start with. Please help

Upvotes: 1

Views: 276

Answers (2)

Erfan
Erfan

Reputation: 42906

We can use Series.map here by enumerating your priority_list and get the lowest index which is the highest in order:

dct_priority = {j:i for i, j in enumerate(priority_List)}
dct_priority_reverse = {i:j for i, j in enumerate(priority_List)}

df['Output'] = df.apply(lambda x: x.map(dct_priority)).min(axis=1).map(dct_priority_reverse)
                  Level_DB                   Level_Legacy                         Output
0      Level 1 Experienced                      Certified            Level 1 Experienced
1           Level 2 Expert            Level 1 Experienced                 Level 2 Expert
2      Level 1 Experienced         Level 3 Thought Leader         Level 3 Thought Leader
3           Level 2 Expert  Level 3 Thought Leader Recert  Level 3 Thought Leader Recert
4   Level 3 Thought Leader  Level 3 Thought Leader Recert         Level 3 Thought Leader
5      Level 1 Experienced                  Non-Certified            Level 1 Experienced
6            Non-Certified                  non-certified                  Non-Certified
7   Level 3 Thought Leader          Level 2 Expert Recert         Level 3 Thought Leader
8                Certified            Level 1 Experienced            Level 1 Experienced
9                Certified                  Non-Certified                      Certified
10                     NaN                      Certified                      Certified
11     Level 1 Experienced                                           Level 1 Experienced

Upvotes: 1

jezrael
jezrael

Reputation: 862611

Idea is create ordered categoricals, with reshape by DataFrame.stack, so output is max value per level=0:

from pandas.api.types import CategoricalDtype
cat_type = CategoricalDtype(categories=priority_List[::-1],ordered=True)

#solution if more columns in data
#df['Output'] = df[['Level_DB','Level_Legacy']].stack().astype(cat_type).max(level=0)
df['Output'] = df.stack().astype(cat_type).max(level=0)
print (df)
                  Level_DB                   Level_Legacy  \
0      Level 1 Experienced                      Certified   
1           Level 2 Expert            Level 1 Experienced   
2      Level 1 Experienced         Level 3 Thought Leader   
3           Level 2 Expert  Level 3 Thought Leader Recert   
4   Level 3 Thought Leader  Level 3 Thought Leader Recert   
5      Level 1 Experienced                  Non-Certified   
6            Non-Certified                  non-certified   
7   Level 3 Thought Leader          Level 2 Expert Recert   
8                Certified            Level 1 Experienced   
9                Certified                  Non-Certified   
10                     NaN                      Certified   
11     Level 1 Experienced                                  

                           Output  
0             Level 1 Experienced  
1                  Level 2 Expert  
2          Level 3 Thought Leader  
3   Level 3 Thought Leader Recert  
4          Level 3 Thought Leader  
5             Level 1 Experienced  
6                   Non-Certified  
7          Level 3 Thought Leader  
8             Level 1 Experienced  
9                       Certified  
10                      Certified  
11            Level 1 Experienced  

Upvotes: 1

Related Questions