How to do calculations on json data in Postgres

Question

I'm storing AdWords report data in Postgres. Each report is stored in a table named Reports, which has a jsonb column named 'data'. Each report has json stored in its 'data' field that looks that looks like this:

[
  {
    match_type: "exact",
    search_query: "gm hubcaps",
    conversions: 2,
    cost: 1.24
  },
  {
    match_type: "broad",
    search_query: "gm auto parts",
    conversions: 34,
    cost: 21.33
  },
  {
    match_type: "phrase",
    search_query: "silverdo headlights",
    conversions: 63,
    cost: 244.05
  }
]

What I want to do is query off these data hashes and sum up the total number of conversions for a given report. I've looked though the Postgresql docs and it looks like you can only really do calculations on hashes, not arrays of hashes like this. Is what I'm trying to do possible in postgres? Do I need to make a temp table out of this array and do calculations off that? Or can I use a stored procedure?

I'm using Postgresql 9.4

EDIT The reason I'm not just using a regular, normalized table is that this is just one example of how report data could be structured. In my project, reports have to allow arbitrary keys, because they are populated by users uploading CSV's with any columns they like. It's basically just a way to get around having arbitrarily many, user-created tables.

Erwin Brandstetter · Accepted Answer

What I want to do is query off these data hashes and sum up the conversions

The fastest way should be with jsonb_populate_recordset(). But you need a registered row type for it.

CREATE TEMP TABLE report_data (
--   match_type text    -- commented out, because we only need ..
-- , search_query text  -- .. conversions for this query
     conversions int
-- , cost numeric
);

A temp table is one way to register a row type ad-hoc. More explanation in this related answer:

jsonb query with nested objects in an array

Assuming a table report with report_id as PK for lack of inforamtion.

SELECT r.report_id, sum(d.conversions) AS sum_conversions
FROM   report r
LEFT   JOIN LATERAL jsonb_populate_recordset(null::report_data, r.data) d ON true
-- WHERE  r.report_id = 12345  -- only for given report?
GROUP  BY 1;

The LEFT JOIN ensures you get a result, even if data is NULL or empty or the JSON array is empty.

For a sum from a single row in the underlying table, this is faster:

SELECT d.sum_conversions
FROM   report r
LEFT   JOIN LATERAL (
   SELECT sum(conversions) AS sum_conversions
   FROM   jsonb_populate_recordset(null::report_data, r.data)
   ) d ON true
WHERE  r.report_id = 12345;  -- enter report_id here

Alternative with jsonb_array_elements() (no need for a registered row type):

SELECT d.sum_conversions
FROM   report r
LEFT   JOIN LATERAL (
   SELECT sum((value->>'conversions')::int) AS sum_conversions
   FROM   jsonb_array_elements(r.data)
   ) d ON true
WHERE  r.report_id = 12345;  -- enter report_id here

Normally you would implement this as plain, normalized table. I don't see the benefit of JSON here (except that your application seems to require it, like you added).

How to do calculations on json data in Postgres

Answers (2)

Related Questions