bbk611
bbk611

Reputation: 321

For loop using np.where

I'm trying to create a new column in a dataframe that labels animals that are domesticated with a 1. I'm using a for loop, but for some reason, the loop only picks up the last item in the pets list. dog, cat, and gerbil should all be assigned a 1 under the domesticated column. Anyone have a fix for this or a better approach?

df = pd.DataFrame(
    {'creature': ['dog', 'cat', 'gerbil', 'mouse', 'donkey']
    })

pets = ['dog', 'cat', 'gerbil']

for pet in pets:
    df['domesticated'] = np.where(df['creature']==pet, 1, 0)

df

Upvotes: 1

Views: 5540

Answers (2)

busybear
busybear

Reputation: 10590

You are setting all non gerbil to 0 in your last loop iteration. That is, when pet is gerbil in your last iteration, ALL entries that are not equal to gerbil will correspond to 0. This includes entries that are dog or cat. You should check all values in pets at once. Try this:

df['domesticated'] = df['creature'].apply(lambda x: 1 if x in pets else 0)

If you want to stick with np.where:

df['domesticated'] = np.where(df['creature'].isin(pets), 1, 0)

Upvotes: 4

gold_cy
gold_cy

Reputation: 14226

The problem is every loop resets your results.

df['domesticated'] = df.isin(pets).astype(int)

  creature  domesticated
0      dog             1
1      cat             1
2   gerbil             1
3    mouse             0
4   donkey             0

Upvotes: 1

Related Questions