user1332821
user1332821

Reputation: 83

Split string into two columns (SQL or PL/SQL)

I need to split a string from one table into two columns in another table. There are a few different variety of numbers and rules but no clear delimiter. Can I use combo of SUBSTR and INSTR or do I need to use if-then loops in PL/SQL to satisfy all the rules?

Input Table

5 Kent Street 
3 A lindt Street
2/15 bold Street 
9/34-36 carmen Road
12/5a sandford Street

Result

Number  |Street
--------------------
5       |Kent Street
3A      |lindt Street
2/15    |bold Street
9/34-36 |carmen Road
12/5a   |sandford Street

Upvotes: 1

Views: 3722

Answers (2)

Andrew Wolfe
Andrew Wolfe

Reputation: 2096

I wouldn't do this in pl/sql, it is really unnecessary.

Oracle SQL has REGEXP_SUBSTR, REGEXP_REPLACE, REGEXP_COUNT. You also have IF and CASE expressions you can put in the SELECT clause. Go to the SQL reference, FUNCTIONS section: http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions.htm#i1482196

Your dataset has some interesting variations - letters and fractions in the house/building number. What you don't have (yet) is street names in multiple parts (e.g. Melnea Cass Boulevard here in Boston) or street names with concatenated/missing ("Broadway") or unusual ("Cedar Passway") 'street' designators.

Start with sample_data as a refactored query to hold your input data. You'll probably have a table or view or something instead.

For now we are holding the assumption that each street name has two words. We start by counting them with REGEXP_COUNT. This is the subquery COUNTED_DATA with value WORDS as the count of words. Note I append a space to each line of input, in case there isn't one at the end of the input row of data, so that the count is correct.

I search for each word as

[^ ]+[ ]+

That is, one or more non-spaces followed by one or more spaces. I don't want to use zero or more spaces ([ ]*) because that is ambiguous.

Then we use a regular expression to pick out the last two words and the first (words minus 2) words.

Here's the result:

with sample_data as (
    select '5 Kent Street' as addr from dual
    union all select '3 A lindt Street' as addr from dual
    union all select '2/15 bold Street' as addr from dual
    union all select '9/34-36 carmen Road' as addr from dual
    union all select '12/5a sandford Street' from dual
)
select 
    counted_data.addr as "Original Address",
    substr (regexp_replace (addr || ' ', '(([^ ]+[ ]+){' || (words-2) ||'})([^ ].*)','\1'), 1, 10) as "Number",
    substr (trim (regexp_replace (addr || ' ', '(([^ ]+[ ]+){' || (words-2) ||'})([^ ].*)','\3')), 1, 25) as "Street"
from
(
    select sample_data.addr, regexp_count(addr || ' ', '[ ]+') as words
    from sample_data
) counted_data

Original Address      Number     Street                   
--------------------- ---------- -------------------------
5 Kent Street         5          Kent Street               
3 A lindt Street      3 A        lindt Street              
2/15 bold Street      2/15       bold Street               
9/34-36 carmen Road   9/34-36    carmen Road               
12/5a sandford Street 12/5a      sandford Street           

To make this readable I used 'substr' to cut down the length of the output columns. ("COLUMN" doesn't work in SQL Developer.)

Upvotes: 2

dash1e
dash1e

Reputation: 7807

I can suggest you three possibilities:

  1. If the separation character is clear and simple, use as you suggest a combo of SUBSTR and INSTR.

  2. If your database supports regular expression function, and it is simple matching your address field with a regular expression use it. For example MySQL has REGEXP.

  3. But if the parsing of the string is complex and the variants are too many then use an external script with a complete programming language.

    Connect to the database, for example in Java, process the string and insert the results in the new table. Remember to use bulk operations if your rows number is huge, so it will be faster.

Upvotes: 0

Related Questions