Reputation: 57
I have a dataframe that I'm working with that contains a column that has state names spelled out and Im' trying to convert that into the two letter abbreviation form. I found a separate cvs file with all the state names and converted it into a dictionary. I then tried to use that dictionary to map the column but got NaN errors for my output columns.
The original dataframe I had contains a column with city and state grouped together. I've split them into two separate columns and the state is the one that I'm playing around with.
Here's what my dataframe looks like after I've split them:
print(newtop50.head())
city_state 2018 city state
11698 New York, New York 8398748 New York New York
1443 Los Angeles, California 3990456 Los Angeles California
3415 Chicago, Illinois 2705994 Chicago Illinois
17040 Houston, Texas 2325502 Houston Texas
665 Phoenix, Arizona 1660272 Phoenix Arizona
This is what a few rows of my dictionary looks like:
print(states_dic)
{'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR', 'California': 'CA', 'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE', 'District of Columbia': 'DC', 'Florida': 'FL', 'Georgia': 'GA', 'Hawaii': 'HI', 'Idaho': 'ID'
Here's what I've tried:
newtop50['state'] = newtop50['state'].map(states_dic)
print(newtop50.head())
city_state 2018 city state
11698 New York, New York 8398748 New York NaN
1443 Los Angeles, California 3990456 Los Angeles NaN
3415 Chicago, Illinois 2705994 Chicago NaN
17040 Houston, Texas 2325502 Houston NaN
665 Phoenix, Arizona 1660272 Phoenix NaN
Not quite sure what I'm missing here?
Upvotes: 1
Views: 614
Reputation: 75100
Incase you dont want to manually create the mapping(as the example has missing values) , you can use this module:
import us
states_dic=us.states.mapping('name', 'abbr')
df.state.map(states_dic)
11698 NY
1443 CA
3415 IL
17040 TX
665 AZ
Upvotes: 1
Reputation: 10960
You have explained that you have split the city_state
column into city
and state
. For map
to work, the value must be an exact match. What I speculate is that you have spaces
on either side of the state series.
Try doing
newtop50['state'].str.strip().map(states_dic)
Upvotes: 1