Reputation: 37
I am dealing with a large dataset (30 million rows) and I need to pull the most recent three dates (which may have an indeterminate number of rows attached to them) so like 03MAR2016 might have 2 rows 27FEB2016 might have ten and 25FEB2016 might have 3. How do I say "Select everything that falls within the last X number of values in this set regardless of how many rows there are"?
Upvotes: 2
Views: 1041
Reputation: 797
As you can not sort in an in-line view/subquery you will have to split your SQL statement in two parts:
But as stated before, SQL is not good at this kind of operation.
DATA input_data ;
INPUT date value ;
CARDS ;
20160101 1
20160101 2
20160101 3
20160102 1
20160103 1
20160104 1
20160105 1
20160105 2
20160105 3
;
proc sql _method;
create table DATE_ID as
select distinct DATE
from input_data
order by DATE DESC;
create table output_data as
select data.*
from (select *
from DATE_ID
where monotonic() <= 3
) id
inner join input_data data
on id.DATE = data.DATE
;
quit;
Upvotes: 1
Reputation: 63424
You need to break this down into two tasks.
Both are possible in SQL, though the first is much easier using other methods (SAS's SQL isn't very good at getting the "first X things").
I would suggest using something like PROC FREQ
or PROC TABULATE
to generate the list of dates (just a PROC FREQ on the date variable), really any proc you're comfortable with - even PROC SORT
would work (though that's probably less efficient). Then once you have that table, limit it to the three highest observations, and then you can use it in a SQL step to join to the main table and filter to those three dates - or you can use other options, like creating a custom format or hash tables or whatever works for you. 30 million rows isn't so many that a SQL join should be a problem, though, I'd think.
Upvotes: 1