bjax221
bjax221

Reputation: 11

How can I pull certain records and update data in SAS?

I have a table of customer purchases. The goal is to be able to pull summary statistics on the last 20 purchases for each customer and update them as each new order comes in. What is the best way to do this? Do I need to a table for each customer? Keep in mind there are over 500 customers. Thanks.

Upvotes: 1

Views: 40

Answers (2)

momo1644
momo1644

Reputation: 1804

Yes, You can either add a Rank to your existing table or create another table containing the last 20 purchases for each customer.

My recommendation is to use a datasetp to select the top20 purchasers per customer then do your summary statistics. My Code below will create a table called "WANT" with the top 20 and a rank field.

Sample Data:

data have;
input id $ purchase_date amount;
informat purchase_date datetime19.;
format purchase_date datetime19.;
datalines;
cust01 21dec2017:12:12:30 234.57
cust01 23dec2017:12:12:30 2.88
cust01 24dec2017:12:12:30 4.99
cust02 21nov2017:12:12:30 34.5
cust02 23nov2017:12:12:30 12.6
cust02 24nov2017:12:12:30 14.01
;
run;

Sort Data in Descending order by ID and Date:

proc sort data=have ; 
by id descending purchase_date ;
run;

Select Top 2: Change my 2 to 20 in your case

/*Top 2*/
%let top=2;
data want (where=(Rank ne .));
set have;
by id;
retain i;
/*reset counter for top */
if first.id then do;  i=1; end;
if i <= &top then do; Rank= &top+1-i; output; i=i+1;end;
drop i;
run;

Output: Last 2 Customer Purchases:

id=cust01 purchase_date=24DEC2017:12:12:30 amount=4.99 Rank=2 
id=cust01 purchase_date=23DEC2017:12:12:30 amount=2.88 Rank=1 
id=cust02 purchase_date=24NOV2017:12:12:30 amount=14.01 Rank=2 
id=cust02 purchase_date=23NOV2017:12:12:30 amount=12.6 Rank=1 

Upvotes: 0

Joe
Joe

Reputation: 63434

This is asked at a high level, so I'll answer it at that level. If you want more detailed help, you'll want to give more detailed information, and make an attempt to solve the problem yourself.

In SAS, you have the BY statement available in every PROC or DATA step, as well as the CLASS statement, available in most PROCs. These both are useful for doing data analysis at a level below global. For many basic uses they give a similar result, although not in all cases; look up the particular PROC you're using to do your analysis for more detailed information.

Presumably, you'd create one table containing your most twenty recent records per customer, or even one view (a view is like a table, except it's not written to disk), and then run your analysis PROC BY your customer ID variable. If you set it up as a view, you don't even have to rerun that part - you can create a permanent view pointing to your constantly updating data, and the subsetting to last 20 records will happen every time you run the analysis PROC.

Upvotes: 1

Related Questions