Clay
Clay

Reputation: 2999

How do I efficiently calculate summary stats on JSONB arrays nested in Postgres?

Using Postgres 9.6.

I have this working but suspect there's a more efficient way. What's the best way to calculate AVG, SUM, etc. on the MyEventLength arrays?

DROP TABLE IF EXISTS activity;
DROP SEQUENCE IF EXISTS activity_id_seq;
CREATE SEQUENCE activity_id_seq;

CREATE TABLE activity (
    id INT CHECK (id > 0) NOT NULL DEFAULT NEXTVAL ('activity_id_seq'),
    user_id INT,
    events JSONB
);

INSERT INTO activity (user_id,events) VALUES
(1, '{"MyEvent":{"MyEventLength":[450,790,1300,5400],"MyEventValue":[334,120,120,940]}}'),
(1, '{"MyEvent":{"MyEventLength":[12],"MyEventValue":[4]}}'),
(2, '{"MyEvent":{"MyEventLength":[450,790,1300,5400],"MyEventValue":[334,120,120,940]}}'),
(1, '{"MyEvent":{"MyEventLength":[1000,2000],"MyEventValue":[450,550]}}');

To date, this is the best way I can figure out to calculate the average for the MyEventLength array for user_id 1:

SELECT avg(recs::text::numeric) FROM (
    SELECT jsonb_array_elements(a.event_length) as recs FROM (
        SELECT events->'MyEvent'->'MyEventLength' as event_length from activity
        WHERE user_id = 1
    )a
) b;

Or this variation:

SELECT avg(recs) FROM (
    SELECT jsonb_array_elements_text(a.event_length)::numeric as recs FROM (
        SELECT events->'MyEvent'->'MyEventLength' as event_length from activity
        WHERE user_id = 1
    )a
) b;

Is there a better way to do this that does not require as many sub selects?

Upvotes: 1

Views: 340

Answers (1)

Nick
Nick

Reputation: 2513

You need to pass rows with scalar values to avg(), otherwise (if you'll try to pass the output of some set-returning function like jsonb_array_elements_text(..)) you will get an errors such as this:

ERROR:  set-valued function called in context that cannot accept a set

So you definitely need at least 1 sub-query or CTE.

Option 1, w/o CTE:

select avg(v::numeric)
from (
  select
    jsonb_array_elements_text(events->'MyEvent'->'MyEventLength')
  from activity
  where user_id = 1
) as a(v);

Option 2, CTE (readability is better):

with vals as (
  select
    jsonb_array_elements_text(events->'MyEvent'->'MyEventLength')::numeric as val
  from activity
  where user_id = 1
)
select avg(val)
from vals
;

UPDATE, Option 3: It turned out, that you can do it w/o any nested queries, using implicit JOIN LATERAL:

select avg(val::text::numeric)
from activity a, jsonb_array_elements(a.events->'MyEvent'->'MyEventLength') vals(val)
where user_id = 1;

Upvotes: 1

Related Questions