Reputation: 4160
I am passing a dictionary to the map
function to recode values in the column of a Pandas dataframe. However, I noticed that if there is a value in the original series that is not explicitly in the dictionary, it gets recoded to NaN
. Here is a simple example:
Typing...
s = pd.Series(['one','two','three','four'])
...creates the series
0 one
1 two
2 three
3 four
dtype: object
But applying the map...
recodes = {'one':'A', 'two':'B', 'three':'C'}
s.map(recodes)
...returns the series
0 A
1 B
2 C
3 NaN
dtype: object
I would prefer that if any element in series s
is not in the recodes
dictionary, it remains unchanged. That is, I would prefer to return the series below (with the original four
instead of NaN
).
0 A
1 B
2 C
3 four
dtype: object
Is there an easy way to do this, for example an option to pass to the map
function? The challenge I am having is that I can't always anticipate all possible values that will be in the series I'm recoding - the data will be updated in the future and new values could appear.
Thanks!
Upvotes: 48
Views: 32829
Reputation: 61
If you still want to use map the map function (can be faster than replace in some cases), you can define missing values:
class MyDict(dict):
def __missing__(self, key):
return key
s = pd.Series(['one', 'two', 'three', 'four'])
recodes = MyDict({
'one':'A',
'two':'B',
'three':'C'
})
s.map(recodes)
0 A
1 B
2 C
3 four
dtype: object
Upvotes: 6