Martien Lubberink
Martien Lubberink

Reputation: 2735

Removing the zip code from a python list (to obtain the state name from MapQuest output)

This should be simple, but could not get it to work.

I have some strings returned to me by the geolocation MapQuest API. I want to isolate the state name from strings like these, which is kind of hard. Think of 'Pennsylvania Avenue' (which is in D.C.), then there is 'Washington', which can be a state, as well as a street name, and a city.

s = "Goldman Sachs Tower, 200, West Street, Battery Park City, Manhattan Community Board 1, New York County, NYC, New York, 10282, United States of America"
s = "9th St NW, Logan Circle/Shaw, Washington, District of Columbia, 20001, United States of America"
s = "Casper, Natrona County, Wyoming, United States of America"

But I noticed that MapQuest writes the state name just before the zip code, near the end of the string.

To obtain the state name, this works, that is, if there is a zip code:

s = s.split(",")
s = [x.strip() for x in s]
state = s[-3]

However, when there is no zip code, as in the third string, then I get the county (Natrona County).

I tried to eliminate the zip code by:

s = s.split(",")
s = [x.strip() for x in s if '\d{5}' not in x ]

But the regex '\d{5}' does not work - I want Wyoming, not Natrona County.

Upvotes: 1

Views: 50

Answers (1)

Dinari
Dinari

Reputation: 2557

Use re:

import re

s = "9th St NW, Logan Circle/Shaw, Washington, District of Columbia, 20001, United States of America"

s = s.split(",")
number = re.compile(r"\d{5}")
s = [x.strip() for x in s if not number.search(x)]
print s
print s[-2]

output:

['9th St NW', 'Logan Circle/Shaw', 'Washington', 'District of Columbia', 'United States of America']
District of Columbia

Here is some small easy tutorial on it: regex tutorial

Upvotes: 2

Related Questions