Reputation: 291
I understand this may be a very simple question, however I'm new to python and I'm unsure how to manipulate pandas data frames very well.
Lets say as an example data:
Job Skill RelationType
Director Manage staff essential
Director Manage staff optional
Target
Lets say as an example data:
Job Skill RelationType
Director Manage staff essential
Director Manage staff essential
Ideally I want to write a function that when the RelationType is different but the Skill is the same it overwrite and replace by in this case for the essential. So for the same job a essential skill will be always more important than the optional.
df['RelationType'] = df.groupby(['Jobs', 'Skill'])['RelationType'].transform('min')
Upvotes: 1
Views: 57
Reputation: 164713
Categorical Data is useful for this task. First convert RelationType
to a categorical series, ordered with more prioritised values first.
Then perform a GroupBy
operation by key fields, using the min
function to choose the most prioritised category.
df['RelationType'] = pd.Categorical(df['RelationType'], ordered=True,
categories=['essential', 'optional'])
df['RelationType'] = df.groupby(['Job', 'Skill']).transform('min')
print(df)
Job Skill RelationType
0 Director ManageStaff essential
1 Director ManageStaff essential
Upvotes: 1