PostgreSQL: Query that splits array of one element into multiple elements based on a delimiter and unnest it

Question

I am new to PostgreSQl and I am not able to find a solution to get the output mentioned below.

I have created a table with a column of data type text[], which is an array of text strings (single element).

My table looks like this:

col1       col2
ID1      {"P25963(MI:0326), Homo sapiens);O14920(MI:0326), Homo sapiens)"}
ID2      {"Q8NFZ0(MI:0326), Homo sapiens);P12931(MI:0326), Homo sapiens)"}
ID3      {"P26368(MI:0326), Homo sapiens);Q15637(MI:0326), Homo sapiens); Q15638(MI:0326), Homo sapiens)"}

When I try to access the first and second element of col2, I get below output.

SELECT col2[1] FROM table;

P25963(MI:0326), Homo sapiens);O14920(MI:0326), Homo sapiens
Q8NFZ0(MI:0326), Homo sapiens);P12931(MI:0326), Homo sapiens
P26368(MI:0326), Homo sapiens);Q15637(MI:0326), Homo sapiens); Q15638(MI:0326), Homo sapiens

SELECT col2[2] FROM table;

NULL
NULL
NULL

I would like to split this array element into multiple elements (presence of ;), unnest them and later perform some regex on the unnested rows.

The desired output after splitting is:

col1       col2
ID1       {P25963(MI:0326), Homo sapiens},{O14920(MI:0326), Homo sapiens}
ID2       {Q8NFZ0(MI:0326), Homo sapiens},{P12931(MI:0326), Homo sapiens}
ID3       {P26368(MI:0326), Homo sapiens},{Q15637(MI:0326), Homo sapiens}, {Q15638(MI:0326), Homo sapiens}

The desired output after unnesting is:

col1     col3                                 col4   
ID1      P25963(MI:0326), Homo sapiens       O14920(MI:0326), Homo sapiens    
ID2      Q8NFZ0(MI:0326), Homo sapiens       P12931(MI:0326), Homo sapiens  
ID3      P26368(MI:0326), Homo sapiens       Q15637(MI:0326), Homo sapiens
ID3      P26368(MI:0326), Homo sapiens       Q15638(MI:0326), Homo sapiens

Any suggestion would be really helpful.

Thanks

GMB · Accepted Answer

I would just access the first element in the text array and use split_part() to get the underliying information:

select 
    col1,
    split_part(col2[1], ';', 1) col3,
    split_part(col2[1], ';', 2) col4
from mytable

Demo on DB Fiddle:

col1 | col3                           | col4                          
:--- | :----------------------------- | :-----------------------------
ID1  | P25963(MI:0326), Homo sapiens) | O14920(MI:0326), Homo sapiens)
ID2  | Q8NFZ0(MI:0326), Homo sapiens) | P12931(MI:0326), Homo sapiens)
ID3  | P26368(MI:0326), Homo sapiens) | Q15637(MI:0326), Homo sapiens)

From the comments: if you want to split over a variable number of elements, I would recommend spreading the data into rows rather than columns. For this, you can use string_to_array() to split the string into an array, then unnest() to generate rows:

select
    t.col1,
    s.pos,
    s.val
from mytable t
cross join lateral unnest(string_to_array(t.col2[1], ';')) with ordinality s(val, pos)

Demo:

col1 | pos | val                           
:--- | --: | :-----------------------------
ID1  |   1 | P25963(MI:0326), Homo sapiens)
ID1  |   2 | O14920(MI:0326), Homo sapiens)
ID2  |   1 | Q8NFZ0(MI:0326), Homo sapiens)
ID2  |   2 | P12931(MI:0326), Homo sapiens)
ID3  |   1 | P26368(MI:0326), Homo sapiens)
ID3  |   2 | Q15637(MI:0326), Homo sapiens)
ID3  |   3 | Q15638(MI:0326), Homo sapiens)

PostgreSQL: Query that splits array of one element into multiple elements based on a delimiter and unnest it

Answers (1)

Related Questions