Cardinal
Cardinal

Reputation: 228

Add new column showing value change from now to next within id group

I want to create a new table. I want to go from

+-----+------+------+
| id  | time | var1 |
+-----+------+------+
|   1 |    1 |   16 |
|   1 |    2 |   16 |
|   2 |    1 |   10 |
|   2 |    2 |   11 |
|   2 |    3 |   12 |
+-----+------+------+

to this

+----+------+------+----------+
| id | time | var1 |  change  |
+----+------+------+----------+
|  1 |    1 |   16 | 16 to 16 |
|  2 |    1 |   10 | 10 to 11 |
|  2 |    2 |   11 | 11 to 12 |
+----+------+------+----------+

That is to say, I want to add a new column which gives the changes over time per identity. This change should be pasted to the "old" value and not the the "new" value. That is, the change of var1 from time 1 to time 2 should be pasted to time 1 and the last time unit should be removed. I hope someone could help me out. Thanks in advance!

Upvotes: 0

Views: 203

Answers (2)

Tom
Tom

Reputation: 51601

Much easier if you just sort by descending time.

data have;
  input id time var1;
cards;
1 1 16
1 2 16
2 1 10
2 2 11
2 3 12
;

proc sort data=have;
  by id descending time;
run;

data want ;
  set have ;
  by id ;
  length change $20 ;
  change = catx(' to ',var1,lag(var1));
  if not first.id;
run;

proc sort;
  by id time;
run;

Results:

Obs    id    time    var1     change

 1      1      1      16     16 to 16
 2      2      1      10     10 to 11
 3      2      2      11     11 to 12

Upvotes: 1

Richard
Richard

Reputation: 27508

You will need to learn about LAG and FIRST. flag values that are automatically set when BY statements are present in DATA Step.

data want;
  set have;
  by id;

  prior_var1 = LAG (var1);
  prior_time = LAG (time);

  length change $30;
  if first.id and last.id then do;
    change = 'None, id has one item';
  end;
  else do;
    time = prior_time;
    change = catx(' ', prior_var1, 'to', var1);
  end;

  drop prior:;
run;

Another approach is to create a lead value of VAR1 by opening the data set a second time, advanced by one observation, in a 1:1 MERGE. The first.id and last.id implicit variables are computed directly and not carried into the output data set.

(Fixed after @Tom comment)

data have;
  do id = 1 to 5;
    do time = 1 to 3; 
      x+1;
      output;
    end;
  end;
  x+1;
  output;
run;

data want(drop=lead_:);
  merge have have(firstobs=2 keep=id x rename=(id=lead_id x=lead_x));

  first.id = id ne lag(id);
  last.id = id ne lead_id;

  length change $30;
  if first.id and last.id then 
    change = 'n/a';
  else
  if not last.id then 
    change = catx(' ', x, 'to', lead_x);

  if not last.id or first.id;
run;

Upvotes: 1

Related Questions