Cheyne
Cheyne

Reputation: 2117

Collect Recursive JSON Keys In Postgres

I have JSON documents stored in Postgres under the JSON data type (Postgres 9.3) and I need to recursively collect the key names down the tree.

For example, given this JSON tree

{
 "files": {
  "folder": {
   "file1": {
    "property": "blah"
   },
   "file2": {
    "property": "blah"
   },
   "file3": {
    "property": "blah"
   },
   "file4": {
    "property": "blah"
   }
 }
},
"software": {
  "apt": {
    "package1": {
        "version": 1.2
    },
    "package2": {
        "version": 1.2
    },
    "package3": {
        "version": 1.2
    },
    "package4": {
        "version": 1.2
    }
  }
 }
}

I would like to extract something like [file1,file2,file3,file3,package1,package2,package3,package4]

Basically just a listing of keys that I can use for a text search index.

I know I can get a listing of keys on the outer most objects using something like

SELECT DISTINCT(json_object_keys(data))

And I know it's possible to to recursively climb through the tree using something like

WITH RECURSIVE data()

but i'm having trouble putting the two together.

Can anyone help?

Upvotes: 11

Views: 12206

Answers (4)

Hubbitus
Hubbitus

Reputation: 5349

@Simon's answer above is great, but for my similar case building JSON objects diff, I want to have keys path like in JSONpath form, and not only last name, including array indexes and also values.

So, on example {"A":[[[{"C":"B"}, {"D":"E"}]]],"X":"Y", "F": {"G": "H"}} I need not only keys X, D, G, C, F, A, but values on each path like .A[0][0][0].C = 'B'.

There are also some minor enhancements like:

  1. Providing data type of value
  2. Provide value itself, without extra quotes

I hope it will be helpful for someone also:

WITH RECURSIVE _tree (key, value, type) AS (

    SELECT
        NULL as key
        ,'{"A":[[[{"C":"B"}, {"D":"E"}]]],"X":"Y", "F": {"G": "H"}}'::jsonb as value
        ,'object'
        UNION ALL
    (
        WITH typed_values AS (
            SELECT key, jsonb_typeof(value) as typeof, value FROM _tree
        )
        SELECT CONCAT(tv.key, '.', v.key), v.value, jsonb_typeof(v.value)
        FROM typed_values as tv, LATERAL jsonb_each(value) v
        WHERE typeof = 'object'
            UNION ALL
        SELECT CONCAT(tv.key, '[', n-1, ']'), element.val, jsonb_typeof(element.val)
        FROM typed_values as tv, LATERAL jsonb_array_elements(value) WITH ORDINALITY as element (val, n)
        WHERE typeof = 'array'
    )
)
SELECT DISTINCT key, value #>> '{}' as value, type
FROM _tree
WHERE key IS NOT NULL
ORDER BY key

Dbfiddle to run.

Upvotes: 2

Simon Kissane
Simon Kissane

Reputation: 5258

I wrote a function to do this:

CREATE OR REPLACE FUNCTION public.jsonb_keys_recursive(_value jsonb)
 RETURNS TABLE(key text)
 LANGUAGE sql
AS $function$
WITH RECURSIVE _tree (key, value) AS (
  SELECT
    NULL   AS key,
    _value AS value
  UNION ALL
  (WITH typed_values AS (SELECT jsonb_typeof(value) as typeof, value FROM _tree)
   SELECT v.*
     FROM typed_values, LATERAL jsonb_each(value) v
     WHERE typeof = 'object'
   UNION ALL
   SELECT NULL, element
     FROM typed_values, LATERAL jsonb_array_elements(value) element
     WHERE typeof = 'array'
  )
)
SELECT DISTINCT key
  FROM _tree
  WHERE key IS NOT NULL
$function$;

For an example, try:

SELECT jsonb_keys_recursive('{"A":[[[{"C":"B"}]]],"X":"Y"}');

Note that the other two answers don't find keys within objects inside arrays, my solution does. (The question didn't give any examples of arrays at all, so finding keys inside arrays may not have been what the original asker needed, but it was what I needed.)

Upvotes: 12

vincent
vincent

Reputation: 2181

A little more concise version that you can just test with:

WITH RECURSIVE reports (key, value) AS (
  SELECT
    NULL as key,
    '{"k1": {"k2": "v1"}, "k3": {"k4": "v2"}, "k5": "v3"}'::JSONB as value

UNION ALL

   SELECT
    jsonb_object_keys(value)as key,
    value->jsonb_object_keys(value) as value
   FROM
    reports
   WHERE
    jsonb_typeof(value) = 'object'
)

SELECT
    *
FROM
    reports;

This will return a list that you then need to group with distinct.

Upvotes: 1

The trick is to add some final condition testing using json_typeof at the right place.

You should also be using jsonb if you don't care about object key order.

Here is my working environment:

CREATE TABLE test (
  id  SERIAL PRIMARY KEY,
  doc JSON
);

INSERT INTO test (doc) VALUES ('{
 "files": {
  "folder": {
   "file1": {
    "property": "blah"
   },
   "file2": {
    "property": "blah"
   },
   "file3": {
    "property": "blah"
   },
   "file4": {
    "property": "blah",
    "prop" : {
      "clap": "clap"
    }
   }
 }
},
"software": {
  "apt": {
    "package1": {
        "version": 1.2
    },
    "package2": {
        "version": 1.2
    },
    "package3": {
        "version": 1.2
    },
    "package4": {
        "version": 1.2
    }
  }
 }
}');

The recursion is stopped when the second query does not return any rows. This is done by passing an empty object to json_each.

 WITH RECURSIVE doc_key_and_value_recursive(key, value) AS (
  SELECT
    t.key,
    t.value
  FROM test, json_each(test.doc) AS t

  UNION ALL

  SELECT
    t.key,
    t.value
  FROM doc_key_and_value_recursive,
    json_each(CASE 
      WHEN json_typeof(doc_key_and_value_recursive.value) <> 'object' THEN '{}' :: JSON
      ELSE doc_key_and_value_recursive.value
    END) AS t
)
SELECT *
FROM doc_key_and_value_recursive
WHERE json_typeof(doc_key_and_value_recursive.value) <> 'object';

Upvotes: 18

Related Questions