Reputation: 43
I have a timescaleDB, I am trying to make a query using jsonb_path_query but I am getting duplicate results.
Table structure:
SELECT * FROM public.meta_table_cum_115
"time" "COLUMNNAME"
"2023-12-22 12:25:31.001244" "{""1"": [{""results"": {""2"": { ""name"": ""AGE"", ""label"": ""12""}}, ""name"": ""PEOPLE"", ""label"": ""toast""}]}"
As you can see there is a nested structure.
I found a mailing at PostgreSQL Website about this issue. As I understand it was a bug, but I cannot find a solution anywhere.
When I run a query:
select sum(count) as count,label from (select count('label'),
jsonb_path_query(COLUMNNAME, '$.** ? (@.name == "AGE" && @.label!=null)') -> 'label' as label
from TABLENAME
WHERE TIME BETWEEN to_timestamp(0) AND to_timestamp(8640000000000) GROUP BY label ) as grouped group by label
I get result:
"count" "label"
1 """12"""
2 """18"""
3 """19"""
1 """31"""
2 """39"""
1 """48"""
But when I run this query:
select sum(count) as count,label from (select count('label'),
jsonb_path_query(COLUMNNAME, '$.** ? (@.name == "PEOPLE" && @.label!=null)') -> 'label' as label
from TABLENAME
WHERE TIME BETWEEN to_timestamp(0) AND to_timestamp(8640000000000) GROUP BY label ) as grouped group by label
Result:
"count" "label"
28 """toast"""
14 """person"""
But the actual result is different in the table and must be:
"count" "label"
14 """toast"""
7 """person"""
Then I deleted one * char from the $.** part($.*), it works and results:
"count" "label"
14 """toast"""
7 """person"""
I don't know what to do now. Do I have to change the way of querying or maybe query differently ($.** or $.*) for nested or non-nested structures?
Upvotes: 0
Views: 76
Reputation: 16063
From Docs :
SQL/JSON path expressions have two modes of handling structural errors:
lax (default) — the path engine implicitly adapts the queried data to the specified path. Any remaining structural errors are suppressed and converted to empty SQL/JSON sequences.
strict — if a structural error occurs, an error is raised.
Also :
The .** accessor can lead to surprising results when using the lax mode. For instance, the following query selects every HR value twice :
lax $.**.HR
Also :
To avoid surprising results, we recommend using the .** accessor only in the strict mode. The following query selects each HR value just once :
strict $.**.HR
Since the default mode is lax
, The .** accessor will lead to twice values in your case :
select jsonb_path_query(COLUMNNAME, '$.** ? (@.name == "PEOPLE" && @.label!=null)') -> 'label' as label
from meta_table_cum_115
Is similar to this :
select jsonb_path_query(COLUMNNAME, 'lax $.** ? (@.name == "PEOPLE" && @.label!=null)') -> 'label' as label
from meta_table_cum_115
Which for data :
create table meta_table_cum_115 (
time timestamp,
COLUMNNAME jsonb
);
insert into meta_table_cum_115 values
('2023-12-22 12:25:31.001244', '{"1": [{"results": {"2": { "name": "AGE", "label": "12"}}, "name": "PEOPLE", "label": "toast"}]}');
Will returns :
label
"toast"
"toast"
So to avoid getting duplicates use the strict mode as suggested in the documentation :
select jsonb_path_query(COLUMNNAME, 'strict $.** ? (@.name == "PEOPLE" && @.label!=null)') -> 'label' as label
from meta_table_cum_115
Will result the correct data :
label
"toast"
Upvotes: 1