How to flatten / unnest / normalize nested JSON column in DuckDB (i.e., split nested fields into separate columns)?

Question

Column types:

D describe some_table;
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │  null   │   key   │ default │  extra  │
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ varchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ id          │ BIGINT      │ YES     │         │         │         │
│ data        │ JSON        │ YES     │         │         │         │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘

Sample data:

D select id, data from some_table;
┌────────┬──────────────────────────────────────────────────────────────────────────────────┐
│   id   │                                    data                                          │
│ int64  │                                    json                                          │
├────────┼──────────────────────────────────────────────────────────────────────────────────┤
│      1 │ {"type":"abc","purpose":"ad","ts":"...","userId":"","context":{"ip":"x.x.x.x",...│
│      2 │ {"type":"abc","purpose":"search","ts":"...","userId":"ABCD1234","context":{"ip...│
│      3 │ {"type":"defghi","purpose":null,"ts":"...","userId":"","context":{"ip":"x.x.x....│
...
└───────────────────────────────────────────────────────────────────────────────────────────┘

Expectation:

┌──────────┬─────────┬─────┬──────────┬─────────────┬──────────────────┐
│ type     │ purpose │ ts  │ userId   │ context.ip  │ context.sth.else │
├──────────┼─────────┼─────┼──────────┼─────────────┼──────────────────┤
│ abc      │ ad      │ ... │          │             │                  │
│ abc      │ search  │ ... │          │             │                  │
│ defghi   │ null    │ ... │ ABCD1234 │             │                  │
└──────────┴─────────┴─────┴──────────┴─────────────┴──────────────────┘

I tried using unnest but it works only with structs / lists:

D select unnest(data) from some_table limit 5;
Error: Binder Error: UNNEST() can only be applied to lists, structs and NULL
LINE 1: select unnest(data) from some_table limit 5;
            ^

So, I then tried unnest with from_json based on Unnest JSON Array into Rows (pseudo-json_each), but that didn't work either:

D select unnest(from_json(data, '"JSON"')) from some_table limit 5;
Error: Binder Error: UNNEST() can only be applied to lists, structs and NULL
LINE 1: select unnest(from_json(data, '"JSON"')) from ...
            ^

I've tried searching for this solution and came across these:

But they all seem to assume the data is stored in JSON files, whereas I've read this table from a CSV file. I want to flatten an existing JSON column.

Alternatively, if I could convert this JSON column into a STRUCT column, that would allow me to use the unnest function. But I couldn't find any online references for that, either.

TL;DR - I'm looking for DuckDB equivalent of pandas.json_normalize.

How to flatten / unnest / normalize nested JSON column in DuckDB (i.e., split nested fields into separate columns)?

Answers (1)

Example setup using Python API

Tempfile

Positional join

Related Questions