Duck_dragon
Duck_dragon

Reputation: 450

Regex for putting comma before city name in address

Generally address comes with comma seperationa and can be splitted using simple regex. e.g

123 Main St, Los Angeles, CA, 90210

We can apply regex here and split using comma. But in my database addresses are stored without comma. e.g

A Better Property Management<br/> 6621 E PACIFIC COAST HWY<br/> STE 255<br/> LONG BEACH CA 90803-4241 

And I want to put comma before the city. Something like this:

A Better Property Management<br/> 6621 E PACIFIC COAST HWY<br/> STE 255<br/> LONG BEACH ,CA 90803-4241

I was thing about finding the last two letter word from the end and put comma using regex . But I also need to account for the situations where we don't have complete address or missing city and pincodes. Is there a way this can be done. I only found solutions where we can split using comma but not the reverse.

I was thinking if we could select the last 2 words before numbers with something like [A-Za-z]{2} (don't know if this is correct). And at the same time if we can check to do this only if the string ends with numbers.

I tried

(\b(AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY|Alabama|Alaska|Arizona|Arkansas|California|Colorado|Connecticut|Delaware|District of Columbia|Florida|Georgia|Hawaii|Idaho|Illinois|Indiana|Iowa|Kansas|Kentucky|Louisiana|Maine|Maryland|Massachusetts|Michigan|Minnesota|Mississippi|Missouri|Montana|Nebraska|Nevada|New Hampshire|New Jersey|New Mexico|New York|North Carolina|North Dakota|Ohio|Oklahoma|Oregon|Pennsylvania|Rhode Island|South Carolina|South Dakota|Tennessee|Texas|Utah|Vermont|Virginia|Washington|West Virginia|Wisconsin|Wyoming)\b)

https://regex101.com/r/75fqO6/1

Upvotes: 0

Views: 51

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You can use

[a-zA-Z]+\s+\d(?:[\d-]*\d)?$

Replace with ,$0. See the regex demo. Details:

  • [a-zA-Z]+ - one or more letters
  • \s+ - one or more whitespaces
  • \d - a digit
  • (?:[\d-]*\d)? - an optional substring of zero or more digits/hyphens and then a digit
  • $ - end of string.

The $0 in the replacement is a backreference to the whole match value, all text matched by the regex is put back where it was found with a prepended comma.

Upvotes: 1

Related Questions