Reputation: 1475
I am new to PostgreSQl and I am not able to find a solution to get the output mentioned below.
I have created a table with a column of data type text[], which is an array of text strings (single element).
My table looks like this:
col1 col2
ID1 {"P25963(MI:0326), Homo sapiens);O14920(MI:0326), Homo sapiens)"}
ID2 {"Q8NFZ0(MI:0326), Homo sapiens);P12931(MI:0326), Homo sapiens)"}
ID3 {"P26368(MI:0326), Homo sapiens);Q15637(MI:0326), Homo sapiens); Q15638(MI:0326), Homo sapiens)"}
When I try to access the first and second element of col2, I get below output.
SELECT col2[1] FROM table;
P25963(MI:0326), Homo sapiens);O14920(MI:0326), Homo sapiens
Q8NFZ0(MI:0326), Homo sapiens);P12931(MI:0326), Homo sapiens
P26368(MI:0326), Homo sapiens);Q15637(MI:0326), Homo sapiens); Q15638(MI:0326), Homo sapiens
SELECT col2[2] FROM table;
NULL
NULL
NULL
I would like to split this array element into multiple elements (presence of ;), unnest them and later perform some regex on the unnested rows.
The desired output after splitting is:
col1 col2
ID1 {P25963(MI:0326), Homo sapiens},{O14920(MI:0326), Homo sapiens}
ID2 {Q8NFZ0(MI:0326), Homo sapiens},{P12931(MI:0326), Homo sapiens}
ID3 {P26368(MI:0326), Homo sapiens},{Q15637(MI:0326), Homo sapiens}, {Q15638(MI:0326), Homo sapiens}
The desired output after unnesting is:
col1 col3 col4
ID1 P25963(MI:0326), Homo sapiens O14920(MI:0326), Homo sapiens
ID2 Q8NFZ0(MI:0326), Homo sapiens P12931(MI:0326), Homo sapiens
ID3 P26368(MI:0326), Homo sapiens Q15637(MI:0326), Homo sapiens
ID3 P26368(MI:0326), Homo sapiens Q15638(MI:0326), Homo sapiens
Any suggestion would be really helpful.
Thanks
Upvotes: 0
Views: 3183
Reputation: 222682
I would just access the first element in the text array and use split_part()
to get the underliying information:
select
col1,
split_part(col2[1], ';', 1) col3,
split_part(col2[1], ';', 2) col4
from mytable
col1 | col3 | col4 :--- | :----------------------------- | :----------------------------- ID1 | P25963(MI:0326), Homo sapiens) | O14920(MI:0326), Homo sapiens) ID2 | Q8NFZ0(MI:0326), Homo sapiens) | P12931(MI:0326), Homo sapiens) ID3 | P26368(MI:0326), Homo sapiens) | Q15637(MI:0326), Homo sapiens)
From the comments: if you want to split over a variable number of elements, I would recommend spreading the data into rows rather than columns. For this, you can use string_to_array()
to split the string into an array, then unnest()
to generate rows:
select
t.col1,
s.pos,
s.val
from mytable t
cross join lateral unnest(string_to_array(t.col2[1], ';')) with ordinality s(val, pos)
Demo:
col1 | pos | val :--- | --: | :----------------------------- ID1 | 1 | P25963(MI:0326), Homo sapiens) ID1 | 2 | O14920(MI:0326), Homo sapiens) ID2 | 1 | Q8NFZ0(MI:0326), Homo sapiens) ID2 | 2 | P12931(MI:0326), Homo sapiens) ID3 | 1 | P26368(MI:0326), Homo sapiens) ID3 | 2 | Q15637(MI:0326), Homo sapiens) ID3 | 3 | Q15638(MI:0326), Homo sapiens)
Upvotes: 1