Reputation: 527
I'm looking for advice on the best (most efficient) way to compute the top 5 values in magnitude for multiple columns, and I also need to compute the sum for another column.
Say I have data with the headings: (person, daydate, month, dailyqty, dailymax1, dailymax2), where for each day for each person i have the total quantity, the max quantity (measure 1) and the max quantity (measure 2).
What I want to do is compute for each person, for each month (1) the sum of dailyqty, (2) the top 5 values of dailymax1 (3) the top 5 values for dailymax2. It may be the case that there aren't even 5 daily values in a month, in which case I wish to return null.
I can't think of how to do this without a join, since I'm an sql newbie. I'm aware that the sum of dailyqty will be repeated for each of the top 5 values in the group - this is okay.
Some dummy data:
CREATE TABLE test (
person varchar(50),
daydate date,
month integer,
dailyqty double precision,
dailymax1 double precision,
dailymax2 double precision
);
INSERT INTO test(person, daydate, month, dailyqty, dailymax1, dailymax2)
VALUES
('A', '2015-01-01', 1, 5, 0.5, 4),
('A', '2015-01-02', 1, 8, 3, 4),
('A', '2015-01-03', 1, 7, 1, 3),
('A', '2015-01-04', 1, 1, 2, 2),
('A', '2015-01-05', 1, 9, 6, 8),
('A', '2015-01-06', 1, 7, 2.5, 7),
('A', '2015-01-07', 1, 2, 4, 7),
('A', '2015-01-08', 1, 5, 1, 3),
('B', '2015-01-01', 1, 20, 8, 1),
('B', '2015-01-02', 1, 22, 9, 2)
Desired result
Thanks! A
Upvotes: 1
Views: 402
Reputation: 1133
This query replicates the desired results posted in the question:
SELECT xt1.person, xt1.month, xt1.monthlyqty, xt3.max1, xt4.max2
FROM (
SELECT SUM(COALESCE(t.dailyqty, 0)) as monthlyqty, t.person, t.month
FROM test t
GROUP by t.person, t.month
) xt1
CROSS JOIN (
VALUES (1), (2), (3), (4), (5)
) xt2
LEFT OUTER JOIN (
SELECT t.person, t.month, t.dailymax1 as max1
, ROW_NUMBER() OVER (PARTITION BY t.person, t.month ORDER BY t.dailymax1 DESC NULLS LAST) as colnumber
FROM test t
) xt3 ON xt2.column1 = xt3.colnumber AND xt1.person = xt3.person AND xt1.month = xt3.month
LEFT OUTER JOIN (
SELECT t.person, t.month, t.dailymax2 as max2
, ROW_NUMBER() OVER (PARTITION BY t.person, t.month ORDER BY t.dailymax2 DESC NULLS LAST) as colnumber
FROM test t
) xt4 ON xt2.column1 = xt4.colnumber AND xt1.person = xt4.person AND xt1.month = xt4.month;
Some things to consider that could change the query... First you could think about whether the columns dailyqty, dailymax1 and dailymax2 are really nullable (like in your table definition). If they were not, you could simplify the COALESCE(t.dailyqty, 0)
to just t.dailyqty
and the two DESC NULLS LAST
to just DESC
.
Second, you may consider replacing the CROSS JOIN
to xt2
with a JOIN to a call to generate_series
like: CROSS JOIN generate_series (1, 5) xt2
and then replacing xt2.column1
appearances with just xt2
. I'm not sure which approach would be more efficient, maybe both do a similar thing, but it is worth to check with your real data if there are significant differences.
Last, you say you want to compute for each person and month, but "month" could refer to the "month" column or the month in the "daydate" column. I've picked the first option as it is easier to write :), but modifying a couple of things the query could be adapted for the other column.
Upvotes: 2
Reputation: 31312
This query gives the output you need:
WITH FilledData AS
(
WITH Filler AS
(
WITH t1 AS
(
SELECT DISTINCT person, month FROM test
),
t2 AS
(
SELECT generate_series as order FROM generate_series(1, 4)
)
SELECT t1.person, t1.month, CAST(NULL AS double precision) AS dailyqty, CAST(NULL AS double precision) AS dailymax1, CAST(NULL AS double precision) AS dailymax2 FROM t1 CROSS JOIN t2
)
SELECT person, month, dailyqty, dailymax1, dailymax2 FROM test UNION ALL
SELECT person, month, dailyqty, dailymax1, dailymax2 FROM Filler ORDER BY person, month
),
monthlyqty AS
(
SELECT person, month, SUM(dailyqty) AS monthlyqty FROM test GROUP BY person, month
),
dailymax1_table AS
(
SELECT person, month, dailymax1, dailymax1_order
FROM (
SELECT *, row_number() over (partition by person, month order by dailymax1 desc NULLS LAST) as dailymax1_order
FROM FilledData
) t1 WHERE dailymax1_order <= 5
),
dailymax2_table AS
(
SELECT person, month, dailymax2, dailymax2_order
FROM (
SELECT *, row_number() over (partition by person, month order by dailymax2 desc NULLS LAST) as dailymax2_order
FROM FilledData
) t2 WHERE dailymax2_order <= 5
)
SELECT dailymax1_table.person, dailymax1_table.month, monthlyqty.monthlyqty, dailymax1_table.dailymax1 as max1, dailymax2_table.dailymax2 as max2
FROM dailymax1_table JOIN monthlyqty
ON monthlyqty.person = dailymax1_table.person AND
monthlyqty.month = dailymax1_table.month
JOIN dailymax2_table ON
dailymax1_table.person = dailymax2_table.person AND
dailymax1_table.month = dailymax2_table.month AND
dailymax1_table.dailymax1_order = dailymax2_table.dailymax2_order;
Upvotes: 1
Reputation: 1270873
You can use window functions to put this together:
select . . .
from (select t.*,
sum(dailyqty) over (partition by person, date_trunc('month', datecol)) as monthqty,
row_number() over (partition by person, date_trunc('month', datecol) order by dailyqty desc) as seqnum
from t
) t
where seqnum <= 5;
You can extract the columns that you want from the subquery.
Upvotes: 0