J.W.
J.W.

Reputation: 113

How to assign values based on multiple columns in pandas?

Is there an elegant way to assign values based on multiple columns in a dataframe in pandas? Let's say I have a dataframe with 2 columns: FruitType and Color.

import pandas as pd
df = pd.DataFrame({'FruitType':['apple', 'banana','kiwi','orange','loquat'],
'Color':['red_black','yellow','greenish_yellow', 'orangered','orangeyellow']})

I would like to assign the value of a third column, 'isYellowSeedless', based on both 'FruitType' and 'Color' columns.

I have a list of fruits that I consider seedless, and would like to check the Color column to see if it contains the str "yellow".

seedless = ['banana', 'loquat']

How do I string this all together elegantly?

This is my attempt that didn't work:

df[(df['FruitType'].isin(seedless)) & (culture_table['Color'].str.contains("yellow"))]['isYellowSeedless'] = True

Upvotes: 1

Views: 3233

Answers (2)

BENY
BENY

Reputation: 323396

Or you can try

df['isYellowSeedless']=df.loc[df.FruitType.isin(seedless),'Color'].str.contains('yellow')
df
Out[546]: 
             Color FruitType isYellowSeedless
0        red_black     apple              NaN
1           yellow    banana             True
2  greenish_yellow      kiwi              NaN
3        orangered    orange              NaN
4     orangeyellow    loquat             True

Upvotes: 2

jezrael
jezrael

Reputation: 863801

Use loc with mask:

m = (df['FruitType'].isin(seedless)) & (df['Color'].str.contains("yellow"))

df.loc[m, 'isYellowSeedless'] = True
print (df)
             Color FruitType isYellowSeedless
0        red_black     apple              NaN
1           yellow    banana             True
2  greenish_yellow      kiwi              NaN
3        orangered    orange              NaN
4     orangeyellow    loquat             True

If need True and False output:

df['isYellowSeedless'] = m
print (df)
             Color FruitType  isYellowSeedless
0        red_black     apple             False
1           yellow    banana              True
2  greenish_yellow      kiwi             False
3        orangered    orange             False
4     orangeyellow    loquat              True

For if-else by some scalars use numpy.where:

df['isYellowSeedless'] = np.where(m, 'a', 'b')
print (df)
             Color FruitType isYellowSeedless
0        red_black     apple                b
1           yellow    banana                a
2  greenish_yellow      kiwi                b
3        orangered    orange                b
4     orangeyellow    loquat                a

And for convert to 0 and 1:

df['isYellowSeedless'] = m.astype(int)
print (df)
             Color FruitType  isYellowSeedless
0        red_black     apple                 0
1           yellow    banana                 1
2  greenish_yellow      kiwi                 0
3        orangered    orange                 0
4     orangeyellow    loquat                 1

Upvotes: 2

Related Questions