rgareth
rgareth

Reputation: 3536

Postgresql: denormalize JSONB with nested key/values and arrays

If needing to denormalize a nested JSONB structure into a flat table, can it be realistically done in a single SELECT if I'm mixing key/values and arrays like the following? I have table containing two columns - repository and data and an example data column value is like the following:

{
    "branches": {
        "master": {
            "type1": {
                "files": [{
                    "filename": "a.txt",
                    "contents": [{
                        "identifier": "foo",
                        "value": "bar"
                    }]
                }]
            }
        }
    }
}

My goal is to have a query that flattens this structure e.g. into the following CSV pseudocode:

repository,branch,type,filename,identifier,value
a/b,master,type1,a.txt,foo,bar

I have kept the JSON example as simple as possible, but would ultimately end up with around a million lines.

Although the need for a single data JSON column is fixed, the format of it can be refactored if it would make the denormalization easier or more efficient. e.g. instead of branches and types being keyed off their unique identifiers, these could be converted into arrays with identifiers instead.

Upvotes: 1

Views: 905

Answers (1)

user330315
user330315

Reputation:

Assuming the elements of contents array should be returned as rows and a valid JSON structure like this:

{
  "branches": {
    "master": {
      "type1": {
        "files": [{"filename": "a.txt","contents": [{"identifier": "foo","value": "bar"}, {"identifier": "a2", "value": "a value"}]},
                  {"filename": "b.txt","contents": [{"identifier": "id2","value": "something else"}]}]
      }
    }
  }
}  

Then you can use this:

select b.branch, 
       t.type, 
       f.file ->> 'filename' as filename, 
       c.content ->> 'identifier' as identifier, 
       c.content ->> 'value' as value
from the_table tt
  cross join jsonb_each(tt.the_column -> 'branches') as b(branch, branch_value)
  cross join jsonb_each(b.branch_value) as t(type, type_value)
  cross join jsonb_array_elements(t.type_value -> 'files') as f(file)
  cross join jsonb_array_elements(f.file -> 'contents') as c(content)

With the corrected JSON above, this returns:

branch | type  | filename | identifier | value         
-------+-------+----------+------------+---------------
master | type1 | a.txt    | foo        | bar           
master | type1 | a.txt    | a2         | a value       
master | type1 | b.txt    | id2        | something else

If contents isn't actually an array, so something like this:

{
  "branches": {
    "master": {
      "type1": {
        "files": [{"filename": "a.txt","contents": {"identifier": "foo","value": "bar"}},
                  {"filename": "b.txt","contents": {"identifier": "id2","value": "something else"}}]
      }
    }
  }
}

Then you can use this query:

select b.branch, t.type, f.file ->> 'filename' as filename, f.file #>> '{contents, identifier}' as identifier, f.file #>> '{contents,value}' as value
from the_table tt
  cross join jsonb_each(tt.the_column -> 'branches') as b(branch, branch_value)
  cross join jsonb_each(b.branch_value) as t(type, type_value)
  cross join jsonb_array_elements(t.type_value -> 'files') as f(file)

which returns:

branch | type  | filename | identifier | value         
-------+-------+----------+------------+---------------
master | type1 | a.txt    | foo        | bar           
master | type1 | b.txt    | id2        | something else

Upvotes: 2

Related Questions