Reputation: 37
I am trying to split a column containing City, State, and Zip into three columns. The data in the column is in this format: 'City, State Zip' - comma separating the city from state, and a space separating state from zip code. I can split out the city using:
df['Owner City State Zip'].str.split(',').apply(lambda x: x[0]
But for some reason when I try the following to split out the state and zip:
df['Owner City State Zip'].str.split(',').apply(lambda x: x[1]
I get the error - Index is out of range
Any help would be appreciated! This seems trivial but has been more difficult than I was expecting.
Upvotes: 1
Views: 2651
Reputation: 294258
Consider the df
df = pd.DataFrame({'Owner City State Zip': ["Los Angeles, CA 90015"]})
print(df)
Owner City State Zip
0 Los Angeles, CA 90015
I'd use this handy bit of regex and pandas str
string accessor
regex = r'(?P<City>[^,]+)\s*,\s*(?P<State>[^\s]+)\s+(?P<Zip>\S+)'
df['Owner City State Zip'].str.extract(regex)
City State Zip
0 Los Angeles CA 90015
Upvotes: 6