Reputation: 65
I have a dataframe with a column in it containing state names. The names are a mix of US states abbreviations and other countries state name. And, I only want to change the us state names and leave others as it is.
Data
Country State
1 United States MI
2 United States PA
3 New Zealand Auckland
4 France Île-de-France
5 United States FL
I have tried this code:
states = {
'AK': 'Alaska',
'AL': 'Alabama',
'AR': 'Arkansas',
'AS': 'American Samoa',
'AZ': 'Arizona',
'CA': 'California',
'CO': 'Colorado',
'CT': 'Connecticut',
'DC': 'District of Columbia',
'DE': 'Delaware',
.
.
.
}
states = {state: abbrev for state, abbrev in states.items()}
Data['State_full'] = Data['State'].map(states)
It replaces the US states as it should but it also replaces the other countries state name with None. What am i missing? Thanks in advance.
Upvotes: 0
Views: 3641
Reputation: 262124
If efficiency is important, you can use states.get
with the same parameter as default value:
df['State'] = df['State'].map(lambda x: states.get(x, x))
map
is faster than replace
Example on 100k rows, map
is twice faster:
%%timeit
df['State'].map(lambda x: states.get(x, x))
# 13.3 ms ± 501 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df['State'].replace(states)
# 30.7 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 1
Reputation:
You can use Series.replace
. It can take a dict, where the keys of the dict are values to find, and the values of the dict are the replacements. If a value isn't found in the dict, it will be left as-is.
df['State'] = df['State'].replace(states)
Output:
>>> df
Country State
1 United States Minnesota
2 United States Pennsylvania
3 New Zealand Auckland
4 France Île-de-France
5 United States Florida
Upvotes: 1