Reputation: 259
I have a column called zipcode in the pandas data frame. Some the rows contain NaN values, some contain correct string format like '160 00' and the rest contain the wrong format like '18000'. What I want is to skip NaN values (not to drop them) and convert wrong zipcodes into correct ones; for example: '18000' -> '180 00'. Is it possible to do that by applying lambda? All I got is this so far:
df['zipcode']apply(lambda row: print(row[:3] + ' ' + row[3:]) if type(row) == str else row)
Sample of dataframe:
df = pd.DataFrame(np.array(['11100', '246 00', '356 50',
np.nan, '18000', '156 00', '163 00']), columns=['zipcode'])
zipcode
0 11100
1 246 00
2 356 50
3 nan
4 18000
5 156 00
6 163 00
Thank you.
Upvotes: 1
Views: 52
Reputation: 71689
Let us try .str.replace
:
df['zipcode'] = df['zipcode'].str.replace(r'(\d{3})\s*(\d+)', r'\g<1> \g<2>')
zipcode
0 111 00
1 246 00
2 356 50
3 nan
4 180 00
5 156 00
6 163 00
Upvotes: 2