Reputation: 913
Below is my input dataframe
df = pd.DataFrame({'Level_DB': ['Level 1 Experienced' ,'Level 2 Expert', 'Level 1 Experienced', 'Level 2 Expert', 'Level 3 Thought Leader', 'Level 1 Experienced', 'Non-Certified', 'Level 3 Thought Leader', 'Certified', 'Certified', np.nan, 'Level 1 Experienced'],
'Level_Legacy' :[ 'Certified', 'Level 1 Experienced', 'Level 3 Thought Leader', 'Level 3 Thought Leader Recert', 'Level 3 Thought Leader Recert', 'Non-Certified', 'non-certified', 'Level 2 Expert Recert', 'Level 1 Experienced', 'Non-Certified', 'Certified', '']})
And, the target column:'Output' should be produced after comparing both input columns 'Level_DB' and 'Level_Legacy' and choose the highest priority value. The priority List is as below
priority_List = ['Level 3 Thought Leader', 'Level 3 Thought Leader New', 'Level 3 Thought Leader Recert', 'Level 3 Thought Leader Recert Lapsed',
'Level 2 Expert', 'Level 2 Expert New', 'Level 2 Expert Recert', 'Level 2 Expert Recert Lapsed',
'Level 1 Experienced', 'Level 1 Experienced New', 'Level 1 Experienced Recert', 'Level 1 Experienced Recert Lapsed', 'Certified', 'Non-Certified' , 'non-certified']
Expected final DataFrame with desired 'Output' column as below
Nothing strikes my brain to start with. Please help
Upvotes: 1
Views: 276
Reputation: 42906
We can use Series.map
here by enumerating
your priority_list
and get the lowest index which is the highest in order:
dct_priority = {j:i for i, j in enumerate(priority_List)}
dct_priority_reverse = {i:j for i, j in enumerate(priority_List)}
df['Output'] = df.apply(lambda x: x.map(dct_priority)).min(axis=1).map(dct_priority_reverse)
Level_DB Level_Legacy Output
0 Level 1 Experienced Certified Level 1 Experienced
1 Level 2 Expert Level 1 Experienced Level 2 Expert
2 Level 1 Experienced Level 3 Thought Leader Level 3 Thought Leader
3 Level 2 Expert Level 3 Thought Leader Recert Level 3 Thought Leader Recert
4 Level 3 Thought Leader Level 3 Thought Leader Recert Level 3 Thought Leader
5 Level 1 Experienced Non-Certified Level 1 Experienced
6 Non-Certified non-certified Non-Certified
7 Level 3 Thought Leader Level 2 Expert Recert Level 3 Thought Leader
8 Certified Level 1 Experienced Level 1 Experienced
9 Certified Non-Certified Certified
10 NaN Certified Certified
11 Level 1 Experienced Level 1 Experienced
Upvotes: 1
Reputation: 862611
Idea is create ordered categoricals, with reshape by DataFrame.stack
, so output is max
value per level=0
:
from pandas.api.types import CategoricalDtype
cat_type = CategoricalDtype(categories=priority_List[::-1],ordered=True)
#solution if more columns in data
#df['Output'] = df[['Level_DB','Level_Legacy']].stack().astype(cat_type).max(level=0)
df['Output'] = df.stack().astype(cat_type).max(level=0)
print (df)
Level_DB Level_Legacy \
0 Level 1 Experienced Certified
1 Level 2 Expert Level 1 Experienced
2 Level 1 Experienced Level 3 Thought Leader
3 Level 2 Expert Level 3 Thought Leader Recert
4 Level 3 Thought Leader Level 3 Thought Leader Recert
5 Level 1 Experienced Non-Certified
6 Non-Certified non-certified
7 Level 3 Thought Leader Level 2 Expert Recert
8 Certified Level 1 Experienced
9 Certified Non-Certified
10 NaN Certified
11 Level 1 Experienced
Output
0 Level 1 Experienced
1 Level 2 Expert
2 Level 3 Thought Leader
3 Level 3 Thought Leader Recert
4 Level 3 Thought Leader
5 Level 1 Experienced
6 Non-Certified
7 Level 3 Thought Leader
8 Level 1 Experienced
9 Certified
10 Certified
11 Level 1 Experienced
Upvotes: 1