Reputation: 33
I am looking to convert zip+4 codes into zip codes in a pandas dataframe. I want it to identify that a zip 4 code exists and keep just the first 5 digits. I effectively want to do the below code (although this doesn't work in this format):
df.replace('^(\d{5}-?\d{4})', group(1), regex=True)
The following code does the same procedure for a list, I'm looking to do the same thing in the dataframe.
my_input = ['01234-5678', '012345678', '01234', 'A1A 1A1', 'A1A1A1']
expression = re.compile(r'^(\d{5})-?(\d{4})?$')
my_output = []
for string in my_input:
if m := re.match(expression, string):
my_output.append(re.match(expression, string).group(1))
else:
my_output.append(string)
Upvotes: 1
Views: 324
Reputation: 627343
You can use
df = df.replace(r'^(\d{5})-?\d{4}$', r'\1', regex=True)
See the regex demo.
Details:
^
- start of string(\d{5})
- Group 1 (\1
): five digits-?
- an optional -
\d{4}
- any four digits$
- end of string.Upvotes: 1