Ryan Bostwick
Ryan Bostwick

Reputation: 419

How to create a pivot table by product by month in SQL

I have 3 tables:

users (id, account_balance)
grocery (user_id, date, amount_paid)
fishmarket (user_id, date, amount_paid)

Both fishmarket and grocery tables may have multiple occurrences for the same user_id with different dates and amounts paid or have nothing at all for any given user. I am trying to develop a pivot table of the following structure:

id | grocery_amount_paid_January | fishmarket_amount_paid_January
  1          10                           NULL
  2          40                           71

The only idea I can come with is to create multiple left joins, but this should be wrong since there will be 24 joins (per each month) for each product. Is there a better way?

Upvotes: 1

Views: 1662

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657202

I have provided a lot of answers on crosstab queries in PostgreSQL lately. Sometimes a "plain" query like the following does the job:

WITH x AS (SELECT '2012-01-01'::date AS _from
                 ,'2012-12-01'::date As _to)  -- provide date range once in CTE
SELECT u.id
      ,to_char(m.mon, 'MM.YYYY') AS month_year
      ,g.amount_paid AS grocery_amount_paid
      ,f.amount_paid AS fishmarket_amount_paid
FROM   users u
CROSS  JOIN (SELECT generate_series(_from, _to, '1 month') AS mon FROM x) m 
LEFT   JOIN (
   SELECT user_id
         ,date_trunc('month', date) AS mon
         ,sum(amount_paid) AS amount_paid
   FROM   x, grocery                        -- CROSS JOIN with a single row
   WHERE  date >= _from
   AND    date <  (_to + interval '1 month')
   GROUP  BY 1,2
   ) g ON g.user_id = u.id AND m.mon = g.mon
LEFT   JOIN (
   SELECT user_id
         ,date_trunc('month', date) AS mon
         ,sum(amount_paid) AS amount_paid
   FROM   x, fishmarket
   WHERE  date >= _from
   AND    date <  (_to + interval '1 month')
   GROUP  BY 1,2
   ) f ON f.user_id = u.id AND m.mon = g.mon
ORDER  BY u.id, m.mon;

produces this output:

id | month_year | grocery_amount_paid | fishmarket_amount_paid
---+------------+---------------------+------------------------
 1 | 01.2012    | 10                  | NULL
 1 | 02.2012    | NULL                | 65
 1 | 03.2012    | 98                  | 13
...
 2 | 02.2012    | 40                  | 71
 2 | 02.2012    | NULL                | NULL

Major points

  • The first CTE is for convenience only. So you have to type your date range once only. You can use any date range - as long as it's dates with the first of the month (rest of the month will be included!). You could add date_trunc() to it, but I guess you can keep the urge to use invalid dates in check.

  • First CROSS JOIN users to the result of generate_series() (m) which provides one row per month in your date range. You have learned in your last question how that results in multiple rows per user.

  • The two subqueries are identical twins. Use WHERE clauses that operate on the base column, so it can utilize an index - which you should have if your table runs over many years (no use for only one or two years, a sequential scan will be faster):

    CREATE INDEX grocery_date ON grocery (date);
    
  • Then reduce all dates to the first of the month with date_trunc() and sum amount_paid per user_id and the resulting mon.

  • LEFT JOIN the result to the base table, again by user_id and the resulting mon. This way, rows are neither multiplied nor dropped. You get one row per user_id and month. Voilá.

BTW, I'd never use a column name id. Call it user_id in the table users as well.

Upvotes: 4

Related Questions