Dylan
Dylan

Reputation: 913

pivot rows to columns in redshift dynamically

I have a Redshift table with different cases, company and date(timestamp)

I used the following query to aggregate number of cases per company by month:

      SELECT
    DATE_TRUNC('MONTH', case_date) AS month,
    company_id,
    COUNT(DISTINCT case_number)      AS case_count
  FROM case
  WHERE case_date >= '2017-01-01'
  AND case_date < DATE_TRUNC('MONTH', CURRENT_DATE)
  GROUP BY 1, 2
  ORDER BY 1

And got:

month                  company_id       case_count
2017-01-01 00:00:00     5786            4
2017-01-01 00:00:00     8681            1
.........               ....         .....
2018-08-01              ....         .....      

I want to pivot the table by company. The desired output is to have each company on a row and columns are each month from 2017-01-01 to the previous month of the current date. So it will look like:

  company_id   2017-01-01_case_count  2017-02-01_case_count  .. 2018-08-01..
    5786          4                        7

I checked many solutions were using CASE, but I think it's not a solution for me since I have so many different months. I also want the solution to be dynamic so that later on I don't have to come back and change my query as time goes by.

I can certainly dump all the data into Python and do this but I want to solve the problem within Redshift.

Upvotes: 1

Views: 3446

Answers (1)

MatBailie
MatBailie

Reputation: 86783

At the very least you need to read the data once to work out what columns you're going to need, then use that information to dynamically write new sql, then run it.

This is necessary because SQL's structure requires queries/datasets to have fixed signatures (column names, data-types, ets) and not dynamic ones (there is no native sql that does what you ask). Which is fine with normalised structures such as you're starting with.

It also means that SQL is not normally the right place fo reformat your data for presentation to a human. That's what your presentation layer should do.

In your case, it looks like you know the columns you're going to want from the parameters in your WHERE clause. This means your python can write the dynamic SQL just from those parameters...

SELECT
  company_id,
  COUNT(DISTINCT CASE WHEN case_date >= '2017-01-01' AND case_date < '2017-02-01' THEN case_number END)   AS 201701_case_count,
  COUNT(DISTINCT CASE WHEN case_date >= '2017-02-01' AND case_date < '2017-03-01' THEN case_number END)   AS 201702_case_count,
  ...
  COUNT(DISTINCT CASE WHEN case_date >= '2018-09-01' AND case_date < '2018-10-01' THEN case_number END)   AS 201809_case_count
FROM
  case
WHERE
      case_date >= '2017-01-01'
  AND case_date <  DATE_TRUNC('MONTH', CURRENT_DATE)
GROUP BY
  company_id

Now, whether or not you should, that's a different matter.

Upvotes: 2

Related Questions