Miguel
Miguel

Reputation: 187

Redshift get Json Keys

I have a table field with a value like this:

{
    "0fc8a2a1-e334-43b8-9311-ce46da9cd32c": {
        "alert": "345",
        "channel": "ios_push",
        "name": "Variant 1"
    },
    "4344d89b-7f0d-4453-b2c5-d0d4a39d7d25": {
        "channel": "ios_push",
        "name": "Control Group",
        "type": "control"
    }
}

I want to know if there is some way to get the "0fc8a2a1-e334-43b8-9311-ce46da9cd32c" and "4344d89b-7f0d-4453-b2c5-d0d4a39d7d25" values.

Upvotes: 4

Views: 5362

Answers (2)

nbtk
nbtk

Reputation: 3149

Although a year have past since this question was opened, but I found how to use it using Redshift native capabilities.

Redshift has a SUPER column type that stores native JSONs: https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html

And can be queried using PartiQL:

https://docs.aws.amazon.com/redshift/latest/dg/query-super.html

https://partiql.org/tutorial.html

Assuming a table called MyTable with a column named data of type SUPER that can store JSON, I have created the following query:

SELECT
  key
FROM
  MyTable AS t,
  UNPIVOT t.data AS value AT key;
  • In case of JSON arrays, the syntax x AS y AT z means "foreach Y in X" where z is the index of object y inside an array x
  • In case of unpivoting JSON objects, the syntax means the same for x and y, but z is the key, instead of the index (When you think about it, also array can be represented in an Object manner, where the index is the key: `{0: 'a', 1: 'b', ... }

The query results are:

create temp table if not exists MyTable(
  data SUPER
);
insert into MyTable VALUES (json_parse('{"a": 1, "b":2}'));

select key from MyTable as t, unpivot t.data as value at key;

enter image description here

Upvotes: 9

botchniaque
botchniaque

Reputation: 5084

Redshift is not good with JSON, and especially not good with arbitrary JSON keys (as @GMB mentioned). Also it's not good with nested data structures.

So actually, you have 2 problems:

  1. extacting the json keys. I see 2 options here:
  • Use python UDF
  • Use regex
  1. unnesting an array of keys into a table. There's a trick to unnest data into rows (see CROSS JOIN with seq table in queries belowbelow) - described in this SO answer.

1. Solution with python UDF

you can implement json parsing in python and register it as a user defined function https://docs.aws.amazon.com/redshift/latest/dg/udf-python-language-support.html

function:

create or replace function f_py_json_keys (a varchar(65535))
    returns varchar(65535)
    stable
as $$
    import json
    return ",".join(json.loads(a).keys())
$$ language plpythonu;

query:

with input(json) as (
    select '{
    "0fc8a2a1-e334-43b8-9311-ce46da9cd32c": {
        "alert": "345",
        "channel": "ios_push",
        "name": "Variant 1"
    },
    "4344d89b-7f0d-4453-b2c5-d0d4a39d7d25": {
        "channel": "ios_push",
        "name": "Control Group",
        "type": "control"
    }
}'::varchar
), seq(idx) as (
    select 1 UNION ALL
    select 2 UNION ALL
    select 3 UNION ALL
    select 4 UNION ALL
    select 5
), input_with_occurences as (
    select f_py_json_keys(json) as keys,
           regexp_count(keys, ',') + 1 as number_of_occurrences
    from input
)
select
    split_part(keys, ',', idx) as id
from input_with_occurences cross join seq
where idx <= number_of_occurrences

2. Solution with REGEX magic

Redshift has some regex functions. That's a working example that would do the job for the payload you specified:

with input(json) as (
    select '{
    "0fc8a2a1-e334-43b8-9311-ce46da9cd32c": {
        "alert": "345",
        "channel": "ios_push",
        "name": "Variant 1"
    },
    "4344d89b-7f0d-4453-b2c5-d0d4a39d7d25": {
        "channel": "ios_push",
        "name": "Control Group",
        "type": "control"
    }
}'::varchar
), seq(idx) as (
    select 1 UNION ALL
    select 2 UNION ALL
    select 3 UNION ALL
    select 4 UNION ALL
    select 5
), input_with_occurences as (
    select *,
           regexp_count(json,
                        '\\{?\\"([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})\\":\\s\\{[\\w\\s\\":,]+\\}') as number_of_occurrences
    from input
)
select
       REGEXP_SUBSTR(json, '\\{?\\"([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})\\":\\s\\{[\\w\\s\\":,]+\\}', 1, idx, 'e') as id
       from input_with_occurences cross join seq
        where idx <= number_of_occurrences

result looks like:

+------------------------------------+
|id                                  |
+------------------------------------+
|0fc8a2a1-e334-43b8-9311-ce46da9cd32c|
|4344d89b-7f0d-4453-b2c5-d0d4a39d7d25|
+------------------------------------+

Upvotes: 4

Related Questions