Batuhan Şekerci
Batuhan Şekerci

Reputation: 43

Duplicate results when using $.** recursive expression in postgre

I have a timescaleDB, I am trying to make a query using jsonb_path_query but I am getting duplicate results.

Table structure:

SELECT * FROM public.meta_table_cum_115

"time"      "COLUMNNAME"    
"2023-12-22 12:25:31.001244"    "{""1"": [{""results"": {""2"": { ""name"": ""AGE"", ""label"": ""12""}}, ""name"": ""PEOPLE"", ""label"": ""toast""}]}"    

As you can see there is a nested structure.

I found a mailing at PostgreSQL Website about this issue. As I understand it was a bug, but I cannot find a solution anywhere.

When I run a query:

select sum(count) as count,label from (select count('label'),
jsonb_path_query(COLUMNNAME, '$.** ? (@.name == "AGE" && @.label!=null)') -> 'label' as label
          
from TABLENAME 
WHERE  TIME BETWEEN to_timestamp(0)  AND to_timestamp(8640000000000)  GROUP BY  label ) as grouped group by label

I get result:

"count" "label"
1   """12"""
2   """18"""
3   """19"""
1   """31"""
2   """39"""
1   """48"""

But when I run this query:

select sum(count) as count,label from (select count('label'),
jsonb_path_query(COLUMNNAME, '$.** ? (@.name == "PEOPLE" && @.label!=null)') -> 'label' as label
          
from TABLENAME 
WHERE  TIME BETWEEN to_timestamp(0)  AND to_timestamp(8640000000000)  GROUP BY  label ) as grouped group by label

Result:

"count" "label"
28  """toast"""
14  """person"""

But the actual result is different in the table and must be:

"count" "label"
14  """toast"""
7   """person"""

Then I deleted one * char from the $.** part($.*), it works and results:

"count" "label"
14  """toast"""
7   """person"""

I don't know what to do now. Do I have to change the way of querying or maybe query differently ($.** or $.*) for nested or non-nested structures?

Upvotes: 0

Views: 76

Answers (1)

SelVazi
SelVazi

Reputation: 16063

From Docs :

SQL/JSON path expressions have two modes of handling structural errors:

lax (default) — the path engine implicitly adapts the queried data to the specified path. Any remaining structural errors are suppressed and converted to empty SQL/JSON sequences.

strict — if a structural error occurs, an error is raised.

Also :

The .** accessor can lead to surprising results when using the lax mode. For instance, the following query selects every HR value twice : lax $.**.HR

Also :

To avoid surprising results, we recommend using the .** accessor only in the strict mode. The following query selects each HR value just once : strict $.**.HR

Since the default mode is lax, The .** accessor will lead to twice values in your case :

select jsonb_path_query(COLUMNNAME, '$.** ? (@.name == "PEOPLE" && @.label!=null)') -> 'label' as label
from meta_table_cum_115 

Is similar to this :

select jsonb_path_query(COLUMNNAME, 'lax $.** ? (@.name == "PEOPLE" && @.label!=null)') -> 'label' as label
from meta_table_cum_115 

Which for data :

create table meta_table_cum_115 (
  time   timestamp,  
  COLUMNNAME  jsonb
);

insert into meta_table_cum_115 values
('2023-12-22 12:25:31.001244', '{"1": [{"results": {"2": { "name": "AGE", "label": "12"}}, "name": "PEOPLE", "label": "toast"}]}');

Will returns :

label
"toast"
"toast"

So to avoid getting duplicates use the strict mode as suggested in the documentation :

select jsonb_path_query(COLUMNNAME, 'strict $.** ? (@.name == "PEOPLE" && @.label!=null)') -> 'label' as label
from meta_table_cum_115 

Will result the correct data :

label
"toast"

Demo here

Upvotes: 1

Related Questions