Pete
Pete

Reputation: 33

Converting zip+4 to zip python

I am looking to convert zip+4 codes into zip codes in a pandas dataframe. I want it to identify that a zip 4 code exists and keep just the first 5 digits. I effectively want to do the below code (although this doesn't work in this format):

df.replace('^(\d{5}-?\d{4})', group(1), regex=True)

The following code does the same procedure for a list, I'm looking to do the same thing in the dataframe.

my_input = ['01234-5678', '012345678', '01234', 'A1A 1A1', 'A1A1A1']
expression = re.compile(r'^(\d{5})-?(\d{4})?$')

my_output = []
for string in my_input: 
    if m := re.match(expression, string): 
        my_output.append(re.match(expression, string).group(1)) 
    else: 
        my_output.append(string)

Upvotes: 1

Views: 324

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627343

You can use

df = df.replace(r'^(\d{5})-?\d{4}$', r'\1', regex=True)

See the regex demo.

Details:

  • ^ - start of string
  • (\d{5}) - Group 1 (\1): five digits
  • -? - an optional -
  • \d{4} - any four digits
  • $ - end of string.

Upvotes: 1

Related Questions