Reputation: 303
I know similar questions have been asked, but none of the solutions I've found have worked (I've listed some of them at the bottom).
I have a list of zip codes where some of them include the +4, which I don't need. Others are either too short (typos) or foreign. Currently all cells are objects.
Example could be:
member state country zip joined
16081 NY UNITED STATES 11215 9/4/09
21186 NY UNITED STATES 5325 8/9/11
34999 NY UNITED STATES 11218 11/4/16
34999 NY FOOBAR STATES NaN 11/4/16
5033 NY UNITED STATES 11238-1630 11/7/16
35079 NY FOOBAR STATES SW4 9JX 11/13/16
35084 NY UNITED STATES 11217-2181 11/14/16
and I'd like to end up with
member state country zip joined
16081 NY UNITED STATES 11215 9/4/09
21186 NY UNITED STATES 5325 8/9/11
34999 NY UNITED STATES 11218 11/4/16
34999 NY FOOBAR STATES NA 11/4/16
5033 NY UNITED STATES 11238 11/7/16
35079 NY FOOBAR STATES SW4 9JX 11/13/16
35084 NY UNITED STATES 11217 11/14/16
Here are a few things I've tried in terms of coding:
for x in df.zip:
if len(x) > 5:
print x.split("-")[0]
x[:x.index("-")]
returns TypeError: object of type 'float' has no len()
df['zips'] = df['zip'].map(lambda x: x.rstrip('-'/n))
returns NameError: global name 'n' is not defined
def zipclip(x):
if x.isnumeric:
if len(x) > 5:
return z[:5]
elif len(x) < 5:
return "NA"
returns AttributeError: 'str' object has no attribute 'isnumeric'
df.zip = [line[:5] if line[:5].isnumeric() and line[6:].isnumeric() else\
line for line in zip if line]
returns TypeError: 'builtin_function_or_method' object is not iterable
Here are some of the places I've looked:
(sorry if I've gone overboard with documentation--I've been criticized in the past and wanted to make sure folks knew I've been working on it)
Upvotes: 2
Views: 998
Reputation: 210952
Try this:
In [88]: df.zip = df.zip.str.split('-').str[0]
In [89]: df
Out[89]:
member state country zip joined
0 16081 NY UNITED STATES 11215 9/4/09
1 21186 NY UNITED STATES 5325 8/9/11
2 34999 NY UNITED STATES 11218 11/4/16
3 34999 NY FOOBAR STATES NaN 11/4/16
4 5033 NY UNITED STATES 11238 11/7/16
5 35079 NY FOOBAR STATES SW4 9JX 11/13/16
6 35084 NY UNITED STATES 11217 11/14/16
Upvotes: 1
Reputation: 303
I'm guessing you guys have done it in a more pythonically correct way (which I'll be checking out)! But this worked for me ( just needed to change the NaN to a string, "NA"):
df['zip'].fillna('NA', inplace=True)
def ziplist(x):
if len(x) > 5:
return x[0:4]
else:
return x
df['zip'] = df['zip'].apply(ziplist);
Upvotes: 0
Reputation: 12515
You can take the zip
column and turn it into a list
df = pd.DataFrame({"zip": ['11215', '5325', '11218', np.nan, '11238-1630', 'SW4 9JX', '11217-2181']})
df['new_zip'] = [str(x).split('-')[0] if str(x) != 'nan' else 'NA' for x in df['zip']]
The split
method here will only "split" if there's a '-'
in the string, otherwise the whole string itself will be returned.
The results in :
>>> df
zip new_zip
0 11215 11215
1 5325 5325
2 11218 11218
3 NaN NA
4 11238-1630 11238
5 SW4 9JX SW4 9JX
6 11217-2181 11217
Upvotes: 0