How to create a pivot table by product by month in SQL

I have 3 tables:

users (id, account_balance)
grocery (user_id, date, amount_paid)
fishmarket (user_id, date, amount_paid)

Both fishmarket and grocery tables may have multiple occurrences for the same user_id with different dates and amounts paid or have nothing at all for any given user. I am trying to develop a pivot table of the following structure:

id | grocery_amount_paid_January | fishmarket_amount_paid_January
  1          10                           NULL
  2          40                           71

The only idea I can come with is to create multiple left joins, but this should be wrong since there will be 24 joins (per each month) for each product. Is there a better way?

Upvotes: 1

Answers (1)

Erwin Brandstetter

Reputation: 657202

I have provided a lot of answers on crosstab queries in PostgreSQL lately. Sometimes a "plain" query like the following does the job:

WITH x AS (SELECT '2012-01-01'::date AS _from
                 ,'2012-12-01'::date As _to)  -- provide date range once in CTE
SELECT u.id
      ,to_char(m.mon, 'MM.YYYY') AS month_year
      ,g.amount_paid AS grocery_amount_paid
      ,f.amount_paid AS fishmarket_amount_paid
FROM   users u
CROSS  JOIN (SELECT generate_series(_from, _to, '1 month') AS mon FROM x) m 
LEFT   JOIN (
   SELECT user_id
         ,date_trunc('month', date) AS mon
         ,sum(amount_paid) AS amount_paid
   FROM   x, grocery                        -- CROSS JOIN with a single row
   WHERE  date >= _from
   AND    date <  (_to + interval '1 month')
   GROUP  BY 1,2
   ) g ON g.user_id = u.id AND m.mon = g.mon
LEFT   JOIN (
   SELECT user_id
         ,date_trunc('month', date) AS mon
         ,sum(amount_paid) AS amount_paid
   FROM   x, fishmarket
   WHERE  date >= _from
   AND    date <  (_to + interval '1 month')
   GROUP  BY 1,2
   ) f ON f.user_id = u.id AND m.mon = g.mon
ORDER  BY u.id, m.mon;

produces this output:

id | month_year | grocery_amount_paid | fishmarket_amount_paid
---+------------+---------------------+------------------------
 1 | 01.2012    | 10                  | NULL
 1 | 02.2012    | NULL                | 65
 1 | 03.2012    | 98                  | 13
...
 2 | 02.2012    | 40                  | 71
 2 | 02.2012    | NULL                | NULL

Major points

The first CTE is for convenience only. So you have to type your date range once only. You can use any date range - as long as it's dates with the first of the month (rest of the month will be included!). You could add date_trunc() to it, but I guess you can keep the urge to use invalid dates in check.
First CROSS JOIN users to the result of generate_series() (m) which provides one row per month in your date range. You have learned in your last question how that results in multiple rows per user.
The two subqueries are identical twins. Use WHERE clauses that operate on the base column, so it can utilize an index - which you should have if your table runs over many years (no use for only one or two years, a sequential scan will be faster):
```
CREATE INDEX grocery_date ON grocery (date);
```
Then reduce all dates to the first of the month with date_trunc() and sum amount_paid per user_id and the resulting mon.
LEFT JOIN the result to the base table, again by user_id and the resulting mon. This way, rows are neither multiplied nor dropped. You get one row per user_id and month. Voilá.

BTW, I'd never use a column name id. Call it user_id in the table users as well.

Upvotes: 4

How to create a pivot table by product by month in SQL

Answers (1)

Major points

Related Questions