Matthew
Matthew

Reputation: 383

Split array of values into rows in Amazon Redshift

How can i split an array of values in a column into corresponding rows in Redshift using a delimiter (,) ?

Input Data:-

—————————————  
Empid | Items  
—————————————  
1001| A, B  
1002| B  
1003| C, D, E  

Required Output:-

—————————————  
Empid | Items  
—————————————  
1001| A  
1001| B  
1002| B  
1003| C  
1003| D  
1003| E  

Any help is appreciated.

Thanks

Upvotes: 4

Views: 14371

Answers (2)

Balint
Balint

Reputation: 403

Based on the official docs, you can do with JOIN!

Let's say your input is:

—————————————  
empid | items  
—————————————  
1001| [A, B]  
1002| [B]  
1003| [C, D, E]
1004| []

Then you can do it as:

SELECT t.empid, items as item
FROM table_name AS t
    LEFT JOIN t.items AS items ON TRUE

This will returns:

—————————————  
empid | item 
—————————————  
1001| A  
1001| B  
1002| B  
1003| C  
1003| D  
1003| E
1004| <NULL>  

Upvotes: 12

demircioglu
demircioglu

Reputation: 3455

Actually with the addition of stored procedures to Redshift this is possible

The procedure below accepts two parameters (source_table and target_table) assuming both table exists it transforms the data described in the question

The way it works is

  1. Reads data from source table row by row
  2. Finds out max items in Items column
  3. In a loop extracts each item
  4. Inserts id + item combo into target table
CREATE OR REPLACE PROCEDURE Array_to_Rows(source_table VARCHAR, target_table VARCHAR) 
LANGUAGE plpgsql
AS $$
DECLARE i INTEGER;
        rec RECORD;
        query VARCHAR;
        item VARCHAR;
        cnt INTEGER;
BEGIN
    query := 'SELECT * FROM ' || source_table;

    FOR rec IN EXECUTE query
    LOOP
        select INTO cnt regexp_count(rec.items,',')+1;
        i := 1; 

        << items_loop >>
        LOOP
            SELECT INTO item trim(split_part(rec.items,',',i));
            EXECUTE 'INSERT INTO ' || target_table || ' values (' || rec.Empid || ',''' || item ||''')';
            i := i + 1;
            EXIT items_loop WHEN (i > cnt);
        END LOOP;  
    END LOOP;      
END;
$$

Usage: CALL Array_to_Rows('source table name','target table name')

With test data in the question it took less than 0.2 seconds, don't know how big OPs data set is

Output is

Empid   item
1001    A
1001    B
1002    B
1003    C
1003    D
1003    E

Upvotes: 0

Related Questions