Brian Brock
Brian Brock

Reputation: 351

regexp_replace string

I'm using regexp_replace to standardize mailing addresses and I've encountered a situation I'm having trouble with.

Consider the following two addresses and what their result should be:

In the second address, "East" is the actual name of the street, not a directional indicator.

For my query, I've attempted

SELECT
     regexp_replace(address, 'East[^ St]', 'E ')

but this fails to convert the first address to it's proper format.

How can I write my regexp_replace such that the word East is converted to an 'E' in the first address, but leaves the word intact in the second address?

Upvotes: 1

Views: 642

Answers (2)

Gary_W
Gary_W

Reputation: 10360

This answers your question with REGEXP_REPLACE(). It looks for the string ' EAST' (don't want to catch the case where 'east' is the end of another word) followed by a space, one or more characters, another space and the string 'St' which is remembered in a group. If found, replace it with ' E' followed by the second remembered group (the space followed by the one or more characters followed by the space and 'St'. This is needed as they are 'consumed' by the regex engine as it moves left to right analyzing the string so you need to put them back. Note I added a bunch of different test formats (always test for the unexpected too!):

SQL> with tbl(address) as (
     select '115 1/2 East 6th St'       from dual union
     select '115 1/2 NorthEast 6th St'  from dual union
     select '115 1/2 East 146th St'     from dual union
     select '115 1/2 East North 1st St' from dual union
     select '818 East Ave'              from dual union
     select '818 Woodward'              from dual union
     select '818 East St'               from dual
   )
   select regexp_replace(address, '( East)( .+ St)', ' E\2') new_addr
   from tbl;

NEW_ADDR
------------------------------------------------------------------------
115 1/2 E 146th St
115 1/2 E 6th St
115 1/2 E North 1st St
115 1/2 NorthEast 6th St
818 East Ave
818 East St
818 Woodward

7 rows selected.

Upvotes: 1

Mr. Llama
Mr. Llama

Reputation: 20889

Your current pattern matches the literal text East followed by any single character that isn't space, S, or t. I'm assuming you probably meant to use a negative lookahead to make sure that "East" doesn't come before " St", but sadly Oracle doesn't support negative lookaheads. Instead, you'll need to make the REGEXP_REPLACE conditional:

CASE
    WHEN address LIKE '%East%' AND address NOT LIKE '%East St%'
        THEN REGEXP_REPLACE(address, your_pattern, your_replacement)
    ELSE address
END

Upvotes: 2

Related Questions