Federico Bonelli
Federico Bonelli

Reputation: 849

Split a CSV field into different rows in SQL

A colleague of mines encountered this problem while working on a COBOL program and eventually solved it at the application level. Still I am curious if it is possible to solve it on the data access level, with SQL. This is somehow related to this other question, but I'd like to use only ANSI SQL.

I'm looking for a single SQL select query that acts on a VARCHAR field that contains variable length CSV rows. The purpose of the query is to split every CSV field in its own result set row.

Here is an example with schema and data (and here is the fiddle):

CREATE TABLE table1 (`field` varchar(100));

 INSERT INTO table1 (`field`)
      VALUES
             ('Hello,world,!')    ,
             ('Haloa,!')          ,
             ('Have,a,nice,day,!');

Here is the output I'd like to have from the query:

Hello
world
!
Haloa
!
Have
a
nice
day
!

The CSV separator used is the comma, and for now I wouldn't worry about escaping.

Upvotes: 0

Views: 849

Answers (3)

user330315
user330315

Reputation:

As far as I can tell, this is ANSI SQL:

with recursive word_list (field, word, rest, field_id, level) as (             
  select field, 
         substring(field from 1 for position(',' in field) - 1) as word,
         substring(field from position(',' in field) + 1) as rest,
         row_number() over () as field_id,
         1
  from table1
  union all 
  select c.field, 
         case 
            when position(',' in p.rest) = 0 then p.rest
            else substring(p.rest from 1 for position(',' in p.rest) - 1) 
         end as word,
         case 
            when position(',' in p.rest) = 0 then null
            else substring(p.rest from position(',' in p.rest) + 1) 
         end as rest, 
         p.field_id,
         p.level + 1
  from table1 as c
    join word_list p on c.field = p.field and position(',' in p.rest) >= 0
) 
select word
from word_list
order by field_id, level;
     

This assumes that the values in field are unique.

Here is a running example

Upvotes: 4

Gary_W
Gary_W

Reputation: 10360

FWIW, here's another Oracle specific method. Maybe it will give an idea or help a future searcher at least.

SQL> with tbl(rownbr, col1) as (
           select 1, 'Hello,world,!'     from dual union
           select 2, 'Haloa,!'           from dual union
           select 3, 'Have,a,nice,day,!' from dual
      )
   SELECT rownbr, column_value substring_nbr,
        regexp_substr(col1, '(.*?)(,|$)', 1, column_value, null, 1)
    FROM tbl,
                TABLE(
                  CAST(
                    MULTISET(SELECT LEVEL
                                FROM dual
                                CONNECT BY LEVEL <= REGEXP_COUNT(col1, ',')+1
                            ) AS sys.OdciNumberList
                  )
                )
      order by rownbr, substring_nbr;

    ROWNBR SUBSTRING_NBR REGEXP_SUBSTR(COL
---------- ------------- -----------------
         1             1 Hello
         1             2 world
         1             3 !
         2             1 Haloa
         2             2 !
         3             1 Have
         3             2 a
         3             3 nice
         3             4 day
         3             5 !

10 rows selected.

SQL>

Upvotes: 0

massko
massko

Reputation: 589

In Oracle you could use something like that (maybe it's not the most elegant but it gives the result you want) - simply replace tab with your_table_name:

WITH 
tab2 AS (
SELECT t.field,
       CASE WHEN INSTR(t.field, ',', 1, 1) > 0 AND regexp_count(t.field,',') >= 1 THEN INSTR(t.field, ',', 1, 1) ELSE NULL END AS pos1,
       CASE WHEN INSTR(t.field, ',', 1, 2) > 0 AND regexp_count(t.field,',') >= 2 THEN INSTR(t.field, ',', 1, 2) ELSE NULL END AS pos2,
       CASE WHEN INSTR(t.field, ',', 1, 3) > 0 AND regexp_count(t.field,',') >= 3 THEN INSTR(t.field, ',', 1, 3) ELSE NULL END AS pos3,
       CASE WHEN INSTR(t.field, ',', 1, 4) > 0 AND regexp_count(t.field,',') >= 4 THEN INSTR(t.field, ',', 1, 4) ELSE NULL END AS pos4,
       CASE WHEN INSTR(t.field, ',', 1, 5) > 0 AND regexp_count(t.field,',') >= 5 THEN INSTR(t.field, ',', 1, 5) ELSE NULL END AS pos5,
       CASE WHEN INSTR(t.field, ',', 1, 6) > 0 AND regexp_count(t.field,',') >= 6 THEN INSTR(t.field, ',', 1, 6) ELSE NULL END AS pos6
FROM tab t
),
tab3 AS (
SELECT SUBSTR(tt.field,1,tt.pos1-1) AS col1,
       SUBSTR(tt.field,tt.pos1+1, CASE WHEN tt.pos2 IS NULL THEN LENGTH(tt.field) - tt.pos1 ELSE tt.pos2 - tt.pos1 - 1 END) AS col2,
       SUBSTR(tt.field,tt.pos2+1, CASE WHEN tt.pos3 IS NULL THEN LENGTH(tt.field) - tt.pos2 ELSE tt.pos3 - tt.pos2 - 1 END) AS col3,
       SUBSTR(tt.field,tt.pos3+1, CASE WHEN tt.pos4 IS NULL THEN LENGTH(tt.field) - tt.pos3 ELSE tt.pos4 - tt.pos3 - 1 END) AS col4,
       SUBSTR(tt.field,tt.pos4+1, CASE WHEN tt.pos5 IS NULL THEN LENGTH(tt.field) - tt.pos4 ELSE tt.pos5 - tt.pos4 - 1 END) AS col5,
       SUBSTR(tt.field,tt.pos5+1, CASE WHEN tt.pos6 IS NULL THEN LENGTH(tt.field) - tt.pos5 ELSE tt.pos6 - tt.pos5 - 1 END) AS col6
       ,ROWNUM AS r
FROM tab2 tt
),
tab4 AS (
SELECT ttt.col1 AS col FROM tab3 ttt WHERE r  = 1
UNION ALL SELECT ttt.col2 FROM tab3 ttt WHERE r  = 1
UNION ALL SELECT ttt.col3 FROM tab3 ttt WHERE r  = 1
UNION ALL SELECT ttt.col4 FROM tab3 ttt WHERE r  = 1
UNION ALL SELECT ttt.col5 FROM tab3 ttt WHERE r  = 1
UNION ALL SELECT ttt.col6 FROM tab3 ttt WHERE r  = 1
UNION ALL
SELECT ttt.col1 FROM tab3 ttt WHERE r  = 2
UNION ALL SELECT ttt.col2 FROM tab3 ttt WHERE r  = 2
UNION ALL SELECT ttt.col3 FROM tab3 ttt WHERE r  = 2
UNION ALL SELECT ttt.col4 FROM tab3 ttt WHERE r  = 2
UNION ALL SELECT ttt.col5 FROM tab3 ttt WHERE r  = 2
UNION ALL SELECT ttt.col6 FROM tab3 ttt WHERE r  = 2
UNION ALL
SELECT ttt.col1 FROM tab3 ttt WHERE r  = 3
UNION ALL SELECT ttt.col2 FROM tab3 ttt WHERE r  = 3
UNION ALL SELECT ttt.col3 FROM tab3 ttt WHERE r  = 3
UNION ALL SELECT ttt.col4 FROM tab3 ttt WHERE r  = 3
UNION ALL SELECT ttt.col5 FROM tab3 ttt WHERE r  = 3
UNION ALL SELECT ttt.col6 FROM tab3 ttt WHERE r  = 3
UNION ALL
SELECT ttt.col1 FROM tab3 ttt WHERE r  = 4
UNION ALL SELECT ttt.col2 FROM tab3 ttt WHERE r  = 4
UNION ALL SELECT ttt.col3 FROM tab3 ttt WHERE r  = 4
UNION ALL SELECT ttt.col4 FROM tab3 ttt WHERE r  = 4
UNION ALL SELECT ttt.col5 FROM tab3 ttt WHERE r  = 4
UNION ALL SELECT ttt.col6 FROM tab3 ttt WHERE r  = 4
UNION ALL
SELECT ttt.col1 FROM tab3 ttt WHERE r  = 5
UNION ALL SELECT ttt.col2 FROM tab3 ttt WHERE r  = 5
UNION ALL SELECT ttt.col3 FROM tab3 ttt WHERE r  = 5
UNION ALL SELECT ttt.col4 FROM tab3 ttt WHERE r  = 5
UNION ALL SELECT ttt.col5 FROM tab3 ttt WHERE r  = 5
UNION ALL SELECT ttt.col6 FROM tab3 ttt WHERE r  = 5
)
SELECT col
FROM tab4
WHERE col IS NOT NULL

it gives me the result:

1   Hello
2   world
3   !
4   Haloa
5   !
6   Have
7   a
8   nice
9   day
10  !

Upvotes: 0

Related Questions