Mila
Mila

Reputation: 73

Change the value of a pandas dataframe column based on a condition ,also depending on other columns of the dataframe

    Category              DishName   Id 
0   a                     Pistachio  621f4884e48bc60012364b13   
1   a                     Pistachio  621f4884e48bc60012364b13   
2   a                     Pistachio  621f4884e48bc60012364b13   
3   a                     achar      621f4884e48bc60012364b13   
4   b                     achar      621f4884e48bc60012364b13   
5   b                     achar      621f4884e48bc60012364b13   
6   a                     chicken    621f4884e48bc60012364b13   
7   b                     chicken    621f4884e48bc60012364b13   
8   c                     chicken    621f4884e48bc60012364b13 

My dataframe has 3 columns category, dishname and id. Considering the id and the dishname I have to assign category.

Assign "a" if all the category values are "a"

Assign "b" if category values are "a","b"

Assign "c" if category values are "a","b","c"

Expected output is

    Category              DishName   Id 
0   a                     Pistachio  621f4884e48bc60012364b13   
1   a                     Pistachio  621f4884e48bc60012364b13   
2   a                     Pistachio  621f4884e48bc60012364b13   
3   b                     achar      621f4884e48bc60012364b13   
4   b                     achar      621f4884e48bc60012364b13   
5   b                     achar      621f4884e48bc60012364b13   
6   c                     chicken    621f4884e48bc60012364b13   
7   c                     chicken    621f4884e48bc60012364b13   
8   c                     chicken    621f4884e48bc60012364b13 

Upvotes: 1

Views: 98

Answers (2)

mozway
mozway

Reputation: 262359

You can transform to ordered Categorical and get the max per group:

df['Category'] = (pd
                  .Series(pd.Categorical(df['Category'],
                                         categories=['a', 'b', 'c'], ordered=True),
                          index=df.index)
                  .groupby(df['DishName'])
                  .transform('max')
                  )

NB. You wouldn't need the categorical for simply a, b, c, as those three are lexicographically sorted, but I imagine a real life case wouldn't necessarily be. As example low < medium < high is logically but not lexicographically sorted.

Output:

  Category   DishName                        Id
0        a  Pistachio  621f4884e48bc60012364b13
1        a  Pistachio  621f4884e48bc60012364b13
2        a  Pistachio  621f4884e48bc60012364b13
3        b      achar  621f4884e48bc60012364b13
4        b      achar  621f4884e48bc60012364b13
5        b      achar  621f4884e48bc60012364b13
6        c    chicken  621f4884e48bc60012364b13
7        c    chicken  621f4884e48bc60012364b13
8        c    chicken  621f4884e48bc60012364b13

Upvotes: 2

BeRT2me
BeRT2me

Reputation: 13251

df['Category'] = df.groupby('DishName')['Category'].transform('max')

Output:

  Category   DishName                        Id
0        a  Pistachio  621f4884e48bc60012364b13
1        a  Pistachio  621f4884e48bc60012364b13
2        a  Pistachio  621f4884e48bc60012364b13
3        b      achar  621f4884e48bc60012364b13
4        b      achar  621f4884e48bc60012364b13
5        b      achar  621f4884e48bc60012364b13
6        c    chicken  621f4884e48bc60012364b13
7        c    chicken  621f4884e48bc60012364b13
8        c    chicken  621f4884e48bc60012364b13

Upvotes: 0

Related Questions