Reputation: 2823
I'm pulling data from an API in JSON with a format like the example data below. Where essentially every "row" is an array of values. The API doc defines the columns and their types in advance. So I know the col1 is, for example, a varchar, and that col2 is an int.
CREATE TEMP TABLE dat (data json);
INSERT INTO dat
VALUES ('{"COLUMNS":["col1","col2"],"DATA":[["a","1"],["b","2"]]}');
I want to transform this within PostgreSQL 9.3 such that I end up with:
col1 | col2
------------
a | 1
b | 2
Using json_array_elements
I can get to:
SELECT json_array_elements(data->'DATA')
FROM dat
json_array_elements
json
---------
["a","1"]
["b","2"]
but then I can't figure out how to do either convert the JSON array to a PostgreSQL array so I can perform something like unnest(ARRAY['a','1'])
Upvotes: 9
Views: 18147
Reputation: 369
this code worked fine for me, maybe it be useful for someone.
select to_json(array_agg(t))
from (
select text, pronunciation,
(
select array_to_json(array_agg(row_to_json(d)))
from (
select part_of_speech, body
from definitions
where word_id=words.id
order by position asc
) d
) as definitions
from words
where text = 'autumn'
) t
Credits: https://hashrocket.com/blog/posts/faster-json-generation-with-postgresql
Upvotes: 0
Reputation: 324325
To get a result like
col1 | col2
------------
a | 1
b | 2
will require a bunch of dynamic SQL, because you don't know the types of the columns in advance, nor the column names.
You can unpack the json with something like:
SELECT
json_array_element_text(colnames, colno) AS colname,
json_array_element_text(colvalues, colno) AS colvalue,
rn,
idx,
colno
FROM (
SELECT
data -> 'COLUMNS' AS colnames,
d AS colvalues,
rn,
row_number() OVER () AS idx
FROM (
SELECT data, row_number() OVER () AS rn FROM dat
) numbered
cross join json_array_elements(numbered.data -> 'DATA') d
) elements
cross join generate_series(0, json_array_length(colnames) - 1) colno;
producing a result set like:
colname | colvalue | rn | idx | colno
---------+----------+----+-----+-------
col1 | a | 1 | 1 | 0
col2 | 1 | 1 | 1 | 1
col1 | b | 1 | 2 | 0
col2 | 2 | 1 | 2 | 1
(4 rows)
You can then use this as input to the crosstab function from the tablefunc module with something like:
SELECT * FROM crosstab('
SELECT
to_char(rn,''00000000'')||''_''||to_char(idx,''00000000'') AS rowid,
json_array_element_text(colnames, colno) AS colname,
json_array_element_text(colvalues, colno) AS colvalue
FROM (
SELECT
data -> ''COLUMNS'' AS colnames,
d AS colvalues,
rn,
row_number() OVER () AS idx
FROM (
SELECT data, row_number() OVER () AS rn FROM dat
) numbered
cross join json_array_elements(numbered.data -> ''DATA'') d
) elements
cross join generate_series(0, json_array_length(colnames) - 1) colno;
') results(rowid text, col1 text, col2 text);
producing:
rowid | col1 | col2
---------------------+------+------
00000001_ 00000001 | a | 1
00000001_ 00000002 | b | 2
(2 rows)
The column names are not retained here.
If you were on 9.4 you could avoid the row_number()
calls and use WITH ORDINALITY
, making it much cleaner.
Since you apparently know the number of columns and their types in advance the query can be considerably simplified.
SELECT
col1, col2
FROM (
SELECT
rn,
row_number() OVER () AS idx,
elem ->> 0 AS col1,
elem ->> 1 :: integer AS col2
FROM (
SELECT data, row_number() OVER () AS rn FROM dat
) numbered
cross join json_array_elements(numbered.data -> 'DATA') elem
ORDER BY 1, 2
) x;
result:
col1 | col2
------+------
a | 1
b | 2
(2 rows)
WITH ORDINALITY
If you were using 9.4 you could keep it cleaner using WITH ORDINALITY
:
SELECT
col1, col2
FROM (
SELECT
elem ->> 0 AS col1,
elem ->> 1 :: integer AS col2
FROM
dat
CROSS JOIN
json_array_elements(dat.data -> 'DATA') WITH ORDINALITY AS elements(elem, idx)
ORDER BY idx
) x;
Upvotes: 10