Reputation: 913
I have a Redshift table with different cases, company and date(timestamp)
I used the following query to aggregate number of cases per company by month:
SELECT
DATE_TRUNC('MONTH', case_date) AS month,
company_id,
COUNT(DISTINCT case_number) AS case_count
FROM case
WHERE case_date >= '2017-01-01'
AND case_date < DATE_TRUNC('MONTH', CURRENT_DATE)
GROUP BY 1, 2
ORDER BY 1
And got:
month company_id case_count
2017-01-01 00:00:00 5786 4
2017-01-01 00:00:00 8681 1
......... .... .....
2018-08-01 .... .....
I want to pivot the table by company. The desired output is to have each company on a row and columns are each month from 2017-01-01 to the previous month of the current date. So it will look like:
company_id 2017-01-01_case_count 2017-02-01_case_count .. 2018-08-01..
5786 4 7
I checked many solutions were using CASE, but I think it's not a solution for me since I have so many different months. I also want the solution to be dynamic so that later on I don't have to come back and change my query as time goes by.
I can certainly dump all the data into Python and do this but I want to solve the problem within Redshift.
Upvotes: 1
Views: 3446
Reputation: 86783
At the very least you need to read the data once to work out what columns you're going to need, then use that information to dynamically write new sql, then run it.
This is necessary because SQL's structure requires queries/datasets to have fixed signatures (column names, data-types, ets) and not dynamic ones (there is no native sql that does what you ask). Which is fine with normalised structures such as you're starting with.
It also means that SQL is not normally the right place fo reformat your data for presentation to a human. That's what your presentation layer should do.
In your case, it looks like you know the columns you're going to want from the parameters in your WHERE
clause. This means your python can write the dynamic SQL just from those parameters...
SELECT
company_id,
COUNT(DISTINCT CASE WHEN case_date >= '2017-01-01' AND case_date < '2017-02-01' THEN case_number END) AS 201701_case_count,
COUNT(DISTINCT CASE WHEN case_date >= '2017-02-01' AND case_date < '2017-03-01' THEN case_number END) AS 201702_case_count,
...
COUNT(DISTINCT CASE WHEN case_date >= '2018-09-01' AND case_date < '2018-10-01' THEN case_number END) AS 201809_case_count
FROM
case
WHERE
case_date >= '2017-01-01'
AND case_date < DATE_TRUNC('MONTH', CURRENT_DATE)
GROUP BY
company_id
Now, whether or not you should, that's a different matter.
Upvotes: 2