Reputation: 609
I want to find the top N names (those with highest value) for the latest period. And for these names I want to get the historic values.
Per this question Select top 10 records for each category
I have tried the following:
base_table (period, name, value)
CREATE VIEW TOP3 AS
SELECT DISTINCT a.*, COUNT(*) as rank
FROM base_table AS a
LEFT JOIN base_table AS a2
ON a.period = a2.period and a.value <= a2.value
GROUP BY a.value
HAVING COUNT(*) <= 3
ORDER BY period, rank;
And this gets me the top 3 for each period.
period | name | value | rank
092020 Joe 100 1
092020 Jane 90 2
092020 Doe 80 3
082020 Jan 100 1
082020 Doc 99 2
082020 Pete 98 3
.
.
.
But not the desired result.
Let's say these are the top 3 for the latest month:
period | name | value | rank
092020 Joe 100 1
092020 Jane 90 2
092020 Doe 80 3
Now I want to get the history for these, even though they are outside top 3 previous month, like so:
period | name | value | rank
092020 Joe 100 1
092020 Jane 90 2
092020 Doe 80 3
082020 Joe 95 4
082020 Jane 94 5
082020 Doe 7 50
. . . .
. . . .
. . . .
092019 Joe 42 20
092019 Doe 34 35
092019 Jane 1 100
FYI I am working in SAS, using PROC SQL, so niceties like CTE's, window functions and other stuff are not available to me.
Upvotes: 0
Views: 1025
Reputation: 51611
Not sure why you would use SQL for this. Here is a method using OUTOBS option of PROC SQL.
First let's make some actual test data:
data have ;
input period :ddmmyy. name $ value rank ;
format period yymm7. ;
cards;
01092020 Joe 100 1
01092020 Jane 90 2
01092020 Doe 80 3
01082020 Joe 95 4
01082020 Jane 94 5
01082020 Doe 7 50
01092019 Joe 42 20
01092019 Doe 34 35
01092019 Jane 1 100
;
Now let's set N into a macro variable to make it a little more flexible. Since the example data only has 3 names let's use N of 2.
%let n=2;
So first find the top N for the most recent (MAX) period.
proc sql;
reset outobs=&n;
create table top&n as select name
from have
having period=max(period)
order by value desc
;
Now reset OUTOBS to MAX and use that TOPn list to pull all of the data from the original dataset for those names.
reset outobs=max;
create table want as
select * from have
where name in (select name from top&n)
;
quit;
Let's look at the results:
proc print data=want;
run;
Which is:
Obs period name value rank
1 2020M09 Joe 100 1
2 2020M09 Jane 90 2
3 2020M08 Joe 95 4
4 2020M08 Jane 94 5
5 2019M09 Joe 42 20
6 2019M09 Jane 1 100
Upvotes: 0
Reputation: 1270411
You can use JOIN
:
SELECT a.*, ar.rank
FROM base_table a JOIN
(SELECT a.value, COUNT(*) as rank
FROM base_table a JOIN
base_table a2
ON a.period = a2.period and a.value <= a2.value JOIN
(SELECT max(a3.period) as max_period
FROM base_table a3
) a3
ON a3.max_period = period
GROUP BY a.value
HAVING COUNT(*) <= 3
) ar
ON ar.value = a.value
ORDER BY period, rank;
I suspect this would be simpler using a data
step, at least for assigning the rank.
Upvotes: 1
Reputation: 501
Can you try it with subqueries? if not you can create a view instead ordered with some row_number() / rank function to JOIN
SELECT b.period, b.name, b.value
FROM base_table b
INNER JOIN
(
-- subquery to get last period
SELECT TOP 3 name, period, value
FROM base_table
WHERE period = (SELECT MAX(period) FROM base_table)
ORDER BY 3 DESC
) AS s ON s.name = b.name -- join them
Upvotes: 1