learner
learner

Reputation: 877

Pandas dataframe replace rows based on values in the list and resulting value is based on another dataframe

I have a pandas dataframe df:

    df = 
          category        str_column
           cat1            str1
           cat2            str2
            ...             ...

and a

   parent_df =
           category        parent_category
           cat1             parent_of_cat1
           cat2             parent_of_cat2

and a list

  faulty_categories =
     ['cat1', ...]

I need to get all the categories in df which are present in faulty_categories and replace them by their parent:

So, expected output:

       df = 

         category       str_column
         parent_of_cat1   str1
         cat2             str2

How to achieve this?

I tried:

       df.loc[df['category'].isin(faulty_categories), 'category'] = parents_df.loc[parents_df['category'].isin(faulty_categories), 'parent']

But, this replaced category by NaN.

Upvotes: 0

Views: 70

Answers (3)

pquest
pquest

Reputation: 333

df

    category    str_column
0   cat1         str1
1   cat2         str2
2   cat3         str3

    category    parent_category
0   cat1        parent_of_cat1
1   cat3        parent_of_cat3
2   cat2        parent_of_cat2

faulty_cat = ['cat1', 'cat3']

Solution:-

for i in range(len(df)):
    if df.loc[i,'category'] in faulty_cat:
        df.loc[i,'category'] = parent_df['parent_category'][parent_df['category'] == df.loc[i,'category']].item()

Output df:-

      category          str_column
0   parent_of_cat1      str1
1   cat2                str2
2   parent_of_cat3      str3

Upvotes: 1

Mayank Porwal
Mayank Porwal

Reputation: 34056

Use df.merge and numpy.where:

In [2226]: import numpy as np

In [2227]: x = df.merge(parent_df)
In [2230]: x['category'] = np.where(x.category.isin(faulty_categories), x.parent_category, x.category)

In [2232]: x.drop('parent_category', 1, inplace=True)

In [2233]: x
Out[2233]: 
         category str_column
0  parent_of_cat1       str1
1            cat2       str2

Upvotes: 1

mozway
mozway

Reputation: 260580

You could use loc and map:

df.loc[df['category'].isin(faulty_categories), 'category'] = df['category'].map(parent_df.set_index('category')['parent_category'])

Or mask and map:

df['category'] = (df['category'].mask(df['category'].isin(faulty_categories),
                                      df['category'].map(parent_df.set_index('category')['parent_category'])
                                     )

Output:

         category str_column
0  parent_of_cat1       str1
1            cat2       str2

Upvotes: 1

Related Questions