Reputation: 351
I'm using regexp_replace to standardize mailing addresses and I've encountered a situation I'm having trouble with.
Consider the following two addresses and what their result should be:
In the second address, "East" is the actual name of the street, not a directional indicator.
For my query, I've attempted
SELECT
regexp_replace(address, 'East[^ St]', 'E ')
but this fails to convert the first address to it's proper format.
How can I write my regexp_replace such that the word East is converted to an 'E' in the first address, but leaves the word intact in the second address?
Upvotes: 1
Views: 642
Reputation: 10360
This answers your question with REGEXP_REPLACE(). It looks for the string ' EAST'
(don't want to catch the case where 'east' is the end of another word) followed by a space, one or more characters, another space and the string 'St'
which is remembered in a group. If found, replace it with ' E'
followed by the second remembered group (the space followed by the one or more characters followed by the space and 'St'
. This is needed as they are 'consumed' by the regex engine as it moves left to right analyzing the string so you need to put them back. Note I added a bunch of different test formats (always test for the unexpected too!):
SQL> with tbl(address) as (
select '115 1/2 East 6th St' from dual union
select '115 1/2 NorthEast 6th St' from dual union
select '115 1/2 East 146th St' from dual union
select '115 1/2 East North 1st St' from dual union
select '818 East Ave' from dual union
select '818 Woodward' from dual union
select '818 East St' from dual
)
select regexp_replace(address, '( East)( .+ St)', ' E\2') new_addr
from tbl;
NEW_ADDR
------------------------------------------------------------------------
115 1/2 E 146th St
115 1/2 E 6th St
115 1/2 E North 1st St
115 1/2 NorthEast 6th St
818 East Ave
818 East St
818 Woodward
7 rows selected.
Upvotes: 1
Reputation: 20889
Your current pattern matches the literal text East
followed by any single character that isn't space, S
, or t
. I'm assuming you probably meant to use a negative lookahead to make sure that "East" doesn't come before " St", but sadly Oracle doesn't support negative lookaheads. Instead, you'll need to make the REGEXP_REPLACE conditional:
CASE
WHEN address LIKE '%East%' AND address NOT LIKE '%East St%'
THEN REGEXP_REPLACE(address, your_pattern, your_replacement)
ELSE address
END
Upvotes: 2