Reputation: 271
I'm trying to change the values of only certain values in a dataframe:
test = pd.DataFrame({'col1': ['a', 'a', 'b', 'c'], 'col2': [1, 2, 3, 4]})
dict_curr = {'a':2}
test['col2'] = np.where(test.col1 == 'a', test.col1.map(lambda x: dict_curr[x]), test.col2)
However, this doesn't seem to work because even though I'm looking only at the values in col1 that are 'a', the error says
KeyError: 'b'
Implying that it also looks at the values of col1 with values 'b'. Why is this? And how do I fix it?
Upvotes: 1
Views: 53
Reputation: 59711
The problem is that when you call np.where
all of its parameters are evaluated first, and then the result is decided depending on the condition. So the dictionary is queried also for 'b'
and 'c'
, even if those values will be discarded later. Probably the easiest fix is:
import pandas as pd
import numpy as np
test = pd.DataFrame({'col1': ['a', 'a', 'b', 'c'], 'col2': [1, 2, 3, 4]})
dict_curr = {'a': 2}
test['col2'] = np.where(test.col1 == 'a', test.col1.map(lambda x: dict_curr.get(x, 0)), test.col2)
This will give the value 0
for keys not in the dictionary, but since it will be discarded later it does not matter which value you use.
Another easy way of getting the same result is:
import pandas as pd
test = pd.DataFrame({'col1': ['a', 'a', 'b', 'c'], 'col2': [1, 2, 3, 4]})
dict_curr = {'a': 2}
test['col2'] = test.apply(lambda x: dict_curr.get(x.col1, x.col2), axis=1)
Upvotes: 1
Reputation: 1106
The error is originating from the test.col1.map(lambda x: dict_curr[x])
part. You look up the values from col1
in dict_curr
, which only has an entry for 'a'
, not for 'b'
.
You can also just index the dataframe:
test.loc[test.col1 == 'a', 'col2'] = 2
Upvotes: 1