Data manipulation in pandas - python

Question

I understand this may be a very simple question, however I'm new to python and I'm unsure how to manipulate pandas data frames very well.

Lets say as an example data:

   Job                Skill                   RelationType
 Director            Manage staff                essential
 Director            Manage  staff               optional

Target

Lets say as an example data:

   Job                Skill                   RelationType
Director            Manage staff                essential
Director            Manage  staff               essential

Ideally I want to write a function that when the RelationType is different but the Skill is the same it overwrite and replace by in this case for the essential. So for the same job a essential skill will be always more important than the optional.

Solved

df['RelationType'] = df.groupby(['Jobs', 'Skill'])['RelationType'].transform('min')

jpp · Accepted Answer

Categorical Data is useful for this task. First convert RelationType to a categorical series, ordered with more prioritised values first.

Then perform a GroupBy operation by key fields, using the min function to choose the most prioritised category.

df['RelationType'] = pd.Categorical(df['RelationType'], ordered=True,
                                    categories=['essential', 'optional'])

df['RelationType'] = df.groupby(['Job', 'Skill']).transform('min')

print(df)

        Job        Skill RelationType
0  Director  ManageStaff    essential
1  Director  ManageStaff    essential

Data manipulation in pandas - python

Solved

Answers (1)

Related Questions