Tomer Shalhon
Tomer Shalhon

Reputation: 225

How to parse JSON with unknown number of keys and nested JSONs, with duplicate keys

I have a JSON that looks something like that one:

{
"condition_a":{
    "name": "aaa",
    "value": 10,
    "source": "bbb"
},
"condition_a":{
    "name": "ccc",
    "value": 20,
    "source": "ddd"
},
"condition_b":{
    "condition_c":{
        "name": "ddd",
        "value": 30,
        "source": "eee"
    },
    "condition_c":{
        "name": "fff",
        "value": 40,
        "source": "ggg"
    }
}

}

It has unknown number of keys, some of them hold nested JSONs within them, some of them don't. The key names are also changing and unknown. In addition, it has duplicate keys, like condition_a is duplicated throughout the JSON but holds different values within.

I want to know how to parse this JSON onto a table that looks like this: enter image description here

con_1 and con_2 are the number of levels of nested JSONs it found to be - which is also unknown!

Thank you in advance!

Upvotes: 0

Views: 104

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Consider below approach

select 
  ifnull(con_1, con_2) con_1,
  if(con_1 is null, null, con_2) con_2,
  name, value, source
from (
  select  kv,
    regexp_extract(norm_text, r'"([^"]+)":{(?:"[^"]+":{[^{}]+},)*' || kv || '.*}') con_1,
    trim(split(kv, ':')[offset(0)], '"') con_2, 
    regexp_extract(kv, r'"name": "([^"]+)"') name,
    regexp_extract(kv, r'"value": (\d+)') value,
    regexp_extract(kv, r'"source": "([^"]+)"') source,
  from your_table,
  unnest([regexp_replace(regexp_replace(regexp_replace(text, r'\n', ''), r'({|,)(\s+)(")', r'\1\3'), r'(")(\s+)(})', r'\1\3')]) norm_text,
  unnest(regexp_extract_all(norm_text, r'"[^"]+":{\s*"name": "[^"]+",\s*"value": \d+,\s*"source": "[^"]+"\s*}')) kv
) 

    

if applied to sample data in your question - output is

enter image description here

Upvotes: 1

Related Questions