Kim Stacks
Kim Stacks

Reputation: 10812

How to use wildcard in the path to search jsonb values for postgres?

Using postgres version 10.13

This is my datatable jsongraphs

id jsongraph
1 { "data": {"scopes_by_id": { "121": { "id": 121, "pk": 121, "name": "Prework" } }, "commonsites_by_id": {"123": {"id": 123, "pk": 123, "name": "Somewhere over the rainbow"}}}}
2 { "data": {"scopes_by_id": { "156": { "id": 156, "pk": 156, "name": "ABC" } }, "commonsites_by_id": {"123": {"id": 123, "pk": 123, "name": "Somewhere over the rainbow"}}}}

I want the distinct values of scope id and site id which should be (121, 123), (156,123)

So I tried

SELECT DISTINCT
            jsongraph->'data'->'scopes_by_id'->>'pk' ,
            jsongraph->'data'->'commonsites_by_id'->>'pk' from jsongraphs;

This won't work because the path should be like data->scopes_by_id->121->>pk but I cannot know beforehand the value of 121 in between.

Is there a way to get the values of what I need by filling in some kind of wildcard in the path?

E.g.data->scopes_by_id->{*}->>pk like that?

ANd because this is legacy data, it's also hard to change the data itself.

Upvotes: 1

Views: 1105

Answers (1)

user330315
user330315

Reputation:

As the nesting level seems to be fixed, you could do something like this:

select j.id, scopes.*, commonsites.*
from jsongraphs j
  cross join lateral (
     select jsonb_agg(j.jsongraph #> array['data','scopes_by_id', t1.scope_id, 'pk']) as scope_ids
     from jsonb_each_text(j.jsongraph #> '{data,scopes_by_id}') as t1(scope_id)
  ) scopes
  cross join lateral ( 
     select jsonb_agg(j.jsongraph #> array['data','commonsites_by_id', t2.site_id, 'pk']) as common_ids
     from jsonb_each_text(j.jsongraph #> '{data,commonsites_by_id}') as t2(site_id)
  ) commonsites
order by id;

The sub-queries extract all key below the respective part (e.g. scopes_by_id) and then uses the #>' operator to access the path for each id inside the original JSON value. And finally all PK values are aggregated back into a single array.

This returns the PK values from each part separately as an array in order to handle the situation where you have a different number of "scope ids" and "commonsite ids"

If you just want "the first" id from each section, you can remove the aggregation and use a LIMIT clause:

select j.id, scopes.*, commonsites.*
from jsongraphs j
  cross join lateral (
     select j.jsongraph #> array['data','scopes_by_id', t1.scope_id, 'pk'] as scope_id
     from jsonb_each_text(j.jsongraph #> '{data,scopes_by_id}') as t1(scope_id)
     limit 1
  ) scopes
  cross join lateral ( 
     select j.jsongraph #> array['data','commonsites_by_id', t2.site_id, 'pk'] as common_id
     from jsonb_each_text(j.jsongraph #> '{data,commonsites_by_id}') as t2(site_id)
     limit 1
  ) commonsites
order by id;

Not sure on which level you want to apply the "distinct" part for this.


In Postgres 12 or later, you could achieve the same with:

select id, 
       jsonb_path_query_array(j.jsongraph, 'strict $.data.scopes_by_id.**.pk') as scopes,
       jsonb_path_query_array(j.jsongraph, 'strict $.data.commonsites_by_id.**.pk') as common
from jsongraphs ;
order by id;

Online example

Upvotes: 2

Related Questions