Reputation: 1537
I want to delete the latest 30 day records from a dataset. Then I have couple of ways to do so.
proc sql;
delete from server.data
where Date >= today() - 30;
quit;
OR
data server.data;
set server.data(where= (Date>= today() - 30));
run;
Which way is better? Or any faster procedures?
Upvotes: 1
Views: 10045
Reputation: 1449
I agree with RawFocus, but could I also suggest to have a look at this paper? It sets out a macro that you can use to compare any two sas steps in their efficiency across a number of metrics and on your specific environment. For example, as RawFocus pointed out, if the drive is of consideration, you might want a technique which performs better on I/O and this macro lets you compare that. I am using a version of this all the time when I am asking myself which way would be the quickest/most efficient:
http://www.lexjansen.com/nesug/nesug10/ma/ma09.pdf
Regards, Vasilij
Upvotes: 2
Reputation: 12691
The first approach is faster.
However this approach leaves the deleted observations in place (they are marked as deleted). This causes a difference between the NOBS and NLOBS properties of the table. So if space is a consideration, I'd recommend the second approach (using a macro variable constant in the same manner). This will rebuild the table without those records. Remember to recreate any indexes and constraints, which will be destroyed in the rebuild process..
EDIT: I had previously suggested that the below would be even faster, however this turned out not to be the case - see Joe's comment in the thread..
proc sql;
delete from server.data
where Date >= %eval(%sysfunc(today()) - 30);
Upvotes: 2