Morten
Morten

Reputation: 43

SAS: How do I point to a specific observation of a value?

I'm very new to SAS and I'm trying to figure out some basic things available in other languages.

I have a table

ID  Number
--  ------
1   2
2   5
3   6
4   1

I would like to create a new variable where I sum the value of one observation of Number to each other observations, like

Number2 = Number + Number[3]

ID  Number  Number2
--  ------  ------
1   2       8
2   5       11
3   6       12
4   1       7

How to I get the value of third observation of Number and add this to each observation of Number in a new variable?

Upvotes: 4

Views: 19621

Answers (2)

Joe
Joe

Reputation: 63434

I'll start by suggesting that Base SAS doesn't really work this way, normally; it's not that it can't, but normally you can solve most problems without pointing to a specific row.

So while this answer will solve your explicit problem, it's probably not something useful in a real world scenario; usually in the real world you'd have a match key or some other element other than 'row number' to combine with, and if you did then you could do it much more efficiently. You also likely could rearrange your data structure in a way that made this operation more convenient.

That said, the specific example you give is trivial:

data have;
input ID  Number;
datalines;
1   2
2   5
3   6
4   1
;;;;
run;

data want;
set have;
_t = 3;
set have(rename=number=number3 keep=number) point=_t ;
number2=number+number3;
run;

If you have SAS/IML (SAS's matrix language), which is somewhat similar to R, then this is a very different story both in your likelihood to perform this operation and in how you'd do it.

proc iml;
 a= {1 2, 2 5, 3 6, 4 1}; *create initial matrix;
 b = a[,2] + a[3,2]; *create a new matrix which is the 2nd column of a added 
                      elementwise to the value in the third row second column;
 c = a||b; *append new matrix to a - could be done in same step of course;
 print b c;
quit;

To do this with the First observation, it's a lot easier.

data want;
set have;
retain _firstpoint; *prevents _firstpoint from being set to missing each iteration;
if _n_ = 1 then _firstpoint=number; *on the first iteration (usually first row) set to number's value;
number = number - _firstpoint; *now subtract that from number to get relative value;
run;

I'll elaborate a little more on this. SAS works on a record-by-record level, where each record is independently processed in the DATA step. (PROCs on the other hand may not behave this way, though many do at some level). SAS, like SQl and similar databases, doesn't truly acknowledge that any row is "first" or "second" or "nth"; however, unlike SQL, it does let you pretend that it is, based on the current sort. The POINT= random access method is one way to go about doing that.

Most of the time, though, you're going to be using something in the data to determine what you want to do rather than some related to the ordering of the data. Here's a way you could do the same thing as the POINT= method, but using the value of ID:

data want; if n = 1 then set have(where=(ID=3) rename=number=number3); set have; number2=number+number3; run;

That in the first iteration of the data step (_N_=1) takes the row from HAVE where Id=3, and then takes the lines from have in order (really it does this:)

*check to see if _n_=1; it is; so take row id=3;
*take first row (id=1);
*check to see if _n_=1; it is not;
*take second row (id=2);
... continue ... 

Variables that are in a SET statement are automatically retained, so NUMBER3 is automatically retained (yay!) and not set to missing between iterations of the data step loop. As long as you don't modify the value, it will stay for each iteration.

Upvotes: 2

BellevueBob
BellevueBob

Reputation: 9618

There are several ways to do this; here is one using the SAS POINT= option:

data have;
   input ID  Number;
datalines;
1   2
2   5
3   6
4   1
run;

data want;
   retain adder;
   drop adder;
   if _n_=1 then do;
      adder = 3;
      set have point=adder;
      adder = number;
      end;

   set have;
   number = number + adder;
run;

The RETAIN and DROP statements define a temp variable to hold the value you want to add. RETAIN means the value is not to be re-initialized to missing each time through the data step and DROP means you do not want to include that variable in the output data set.

The POINT= option allows one to read a specific observation from a SAS data set. The _n_=1 part is a control mechanism to only execute that bit of code once, assigning the variable adder to the value of the third observation.

The next section reads the data set one observation at a time and adds applies your change.

Note that the same data set is read twice; a handy SAS feature.

Upvotes: 4

Related Questions