Whizkid95
Whizkid95

Reputation: 271

Why does the np.where function also seem to work on values

I'm trying to change the values of only certain values in a dataframe:

test = pd.DataFrame({'col1': ['a', 'a', 'b', 'c'], 'col2': [1, 2, 3, 4]})
dict_curr = {'a':2}
test['col2'] = np.where(test.col1 == 'a', test.col1.map(lambda x: dict_curr[x]), test.col2)

However, this doesn't seem to work because even though I'm looking only at the values in col1 that are 'a', the error says

KeyError: 'b'

Implying that it also looks at the values of col1 with values 'b'. Why is this? And how do I fix it?

Upvotes: 1

Views: 53

Answers (2)

javidcf
javidcf

Reputation: 59711

The problem is that when you call np.where all of its parameters are evaluated first, and then the result is decided depending on the condition. So the dictionary is queried also for 'b' and 'c', even if those values will be discarded later. Probably the easiest fix is:

import pandas as pd
import numpy as np

test = pd.DataFrame({'col1': ['a', 'a', 'b', 'c'], 'col2': [1, 2, 3, 4]})
dict_curr = {'a': 2}
test['col2'] = np.where(test.col1 == 'a', test.col1.map(lambda x: dict_curr.get(x, 0)), test.col2)

This will give the value 0 for keys not in the dictionary, but since it will be discarded later it does not matter which value you use.

Another easy way of getting the same result is:

import pandas as pd

test = pd.DataFrame({'col1': ['a', 'a', 'b', 'c'], 'col2': [1, 2, 3, 4]})
dict_curr = {'a': 2}
test['col2'] = test.apply(lambda x: dict_curr.get(x.col1, x.col2), axis=1)

Upvotes: 1

flurble
flurble

Reputation: 1106

The error is originating from the test.col1.map(lambda x: dict_curr[x]) part. You look up the values from col1 in dict_curr, which only has an entry for 'a', not for 'b'.

You can also just index the dataframe:

test.loc[test.col1 == 'a', 'col2'] = 2

Upvotes: 1

Related Questions