Reputation: 83
I need to split a string from one table into two columns in another table. There are a few different variety of numbers and rules but no clear delimiter. Can I use combo of SUBSTR and INSTR or do I need to use if-then loops in PL/SQL to satisfy all the rules?
Input Table
5 Kent Street
3 A lindt Street
2/15 bold Street
9/34-36 carmen Road
12/5a sandford Street
Result
Number |Street
--------------------
5 |Kent Street
3A |lindt Street
2/15 |bold Street
9/34-36 |carmen Road
12/5a |sandford Street
Upvotes: 1
Views: 3722
Reputation: 2096
I wouldn't do this in pl/sql, it is really unnecessary.
Oracle SQL has REGEXP_SUBSTR, REGEXP_REPLACE, REGEXP_COUNT. You also have IF and CASE expressions you can put in the SELECT clause. Go to the SQL reference, FUNCTIONS section: http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions.htm#i1482196
Your dataset has some interesting variations - letters and fractions in the house/building number. What you don't have (yet) is street names in multiple parts (e.g. Melnea Cass Boulevard here in Boston) or street names with concatenated/missing ("Broadway") or unusual ("Cedar Passway") 'street' designators.
Start with sample_data as a refactored query to hold your input data. You'll probably have a table or view or something instead.
For now we are holding the assumption that each street name has two words. We start by counting them with REGEXP_COUNT. This is the subquery COUNTED_DATA with value WORDS as the count of words. Note I append a space to each line of input, in case there isn't one at the end of the input row of data, so that the count is correct.
I search for each word as
[^ ]+[ ]+
That is, one or more non-spaces followed by one or more spaces. I don't want to use zero or more spaces ([ ]*) because that is ambiguous.
Then we use a regular expression to pick out the last two words and the first (words minus 2) words.
Here's the result:
with sample_data as (
select '5 Kent Street' as addr from dual
union all select '3 A lindt Street' as addr from dual
union all select '2/15 bold Street' as addr from dual
union all select '9/34-36 carmen Road' as addr from dual
union all select '12/5a sandford Street' from dual
)
select
counted_data.addr as "Original Address",
substr (regexp_replace (addr || ' ', '(([^ ]+[ ]+){' || (words-2) ||'})([^ ].*)','\1'), 1, 10) as "Number",
substr (trim (regexp_replace (addr || ' ', '(([^ ]+[ ]+){' || (words-2) ||'})([^ ].*)','\3')), 1, 25) as "Street"
from
(
select sample_data.addr, regexp_count(addr || ' ', '[ ]+') as words
from sample_data
) counted_data
Original Address Number Street
--------------------- ---------- -------------------------
5 Kent Street 5 Kent Street
3 A lindt Street 3 A lindt Street
2/15 bold Street 2/15 bold Street
9/34-36 carmen Road 9/34-36 carmen Road
12/5a sandford Street 12/5a sandford Street
To make this readable I used 'substr' to cut down the length of the output columns. ("COLUMN" doesn't work in SQL Developer.)
Upvotes: 2
Reputation: 7807
I can suggest you three possibilities:
If the separation character is clear and simple, use as you suggest a combo of SUBSTR and INSTR.
If your database supports regular expression function, and it is simple matching your address field with a regular expression use it. For example MySQL has REGEXP.
But if the parsing of the string is complex and the variants are too many then use an external script with a complete programming language.
Connect to the database, for example in Java, process the string and insert the results in the new table. Remember to use bulk operations if your rows number is huge, so it will be faster.
Upvotes: 0