Kumachorou
Kumachorou

Reputation: 37

How to pull the last N number of dates in SAS SQL

I am dealing with a large dataset (30 million rows) and I need to pull the most recent three dates (which may have an indeterminate number of rows attached to them) so like 03MAR2016 might have 2 rows 27FEB2016 might have ten and 25FEB2016 might have 3. How do I say "Select everything that falls within the last X number of values in this set regardless of how many rows there are"?

Upvotes: 2

Views: 1041

Answers (2)

Jetzler
Jetzler

Reputation: 797

As you can not sort in an in-line view/subquery you will have to split your SQL statement in two parts:

  1. Sort the date DESCENDING and get the distinct values
  2. Join back to the original data and limit to first 3

But as stated before, SQL is not good at this kind of operation.

DATA input_data ; 
  INPUT date value ; 
CARDS ; 
20160101 1
20160101 2
20160101 3 
20160102 1 
20160103 1
20160104 1
20160105 1
20160105 2
20160105 3
; 

proc sql _method;
create table DATE_ID as
    select distinct DATE
        from input_data
            order by DATE DESC;

create table output_data as
    select data.*
        from (select *
                from DATE_ID
                where monotonic() <= 3
                    ) id
        inner join input_data data
            on id.DATE = data.DATE
        ;
quit;

Upvotes: 1

Joe
Joe

Reputation: 63424

You need to break this down into two tasks.

  1. Determine which dates are the last three dates
  2. Pull all rows from those dates

Both are possible in SQL, though the first is much easier using other methods (SAS's SQL isn't very good at getting the "first X things").

I would suggest using something like PROC FREQ or PROC TABULATE to generate the list of dates (just a PROC FREQ on the date variable), really any proc you're comfortable with - even PROC SORT would work (though that's probably less efficient). Then once you have that table, limit it to the three highest observations, and then you can use it in a SQL step to join to the main table and filter to those three dates - or you can use other options, like creating a custom format or hash tables or whatever works for you. 30 million rows isn't so many that a SQL join should be a problem, though, I'd think.

Upvotes: 1

Related Questions