Reputation: 2735
This should be simple, but could not get it to work.
I have some strings returned to me by the geolocation MapQuest API. I want to isolate the state name from strings like these, which is kind of hard. Think of 'Pennsylvania Avenue' (which is in D.C.), then there is 'Washington', which can be a state, as well as a street name, and a city.
s = "Goldman Sachs Tower, 200, West Street, Battery Park City, Manhattan Community Board 1, New York County, NYC, New York, 10282, United States of America"
s = "9th St NW, Logan Circle/Shaw, Washington, District of Columbia, 20001, United States of America"
s = "Casper, Natrona County, Wyoming, United States of America"
But I noticed that MapQuest writes the state name just before the zip code, near the end of the string.
To obtain the state name, this works, that is, if there is a zip code:
s = s.split(",")
s = [x.strip() for x in s]
state = s[-3]
However, when there is no zip code, as in the third string, then I get the county (Natrona County).
I tried to eliminate the zip code by:
s = s.split(",")
s = [x.strip() for x in s if '\d{5}' not in x ]
But the regex '\d{5}'
does not work - I want Wyoming, not Natrona County.
Upvotes: 1
Views: 50
Reputation: 2557
Use re
:
import re
s = "9th St NW, Logan Circle/Shaw, Washington, District of Columbia, 20001, United States of America"
s = s.split(",")
number = re.compile(r"\d{5}")
s = [x.strip() for x in s if not number.search(x)]
print s
print s[-2]
output:
['9th St NW', 'Logan Circle/Shaw', 'Washington', 'District of Columbia', 'United States of America']
District of Columbia
Here is some small easy tutorial on it: regex tutorial
Upvotes: 2