Reputation: 295
I'm trying to use an array to accomplish the following. However, I've never used one before and I'm having a bit of trouble.
Basically I want to take the following data and create one row(observation) for each acct, seq1, seq2 combination:
acct seq1 seq2 la ln
9999 1 10 20.01 100
9999 1 10 19.05 1
9999 2 11 26.77 100
9999 2 11 24.86 1
8888 3 20 38.43 218
8888 3 20 37.53 1
This is what I want to end up with. Note, I'm only showing la1 through la3 and ln1 through ln3 for the sake of space. It would actually to go to la7 and ln7:
acct seq1 seq2 la1 la2 la3 ln1 ln2 ln3
9999 1 10 20.01 19.05 . 100 1 .
9999 2 11 26.77 24.86 . 100 1 .
8888 3 20 38.43 37.53 . 218 1 .
Here is the code that I've attempted so far. Any assistance would be greatly appreciated:
data want;
set have;
by acct seq1 seq2;
if first.seq2 then do;
array la_array {7} la1-la7;
array ln_array {7} ln1-ln7;
end;
do i = 1 to 7;
la_array(i)=la;
ln_array(i)=ln;
end;
run;
Upvotes: 3
Views: 10968
Reputation: 3043
Ok... you have to keep in mind that SAS arrays are tipically not real vectors or arrays of memory variables like in most other languages. They are just placeholders for dataset variables.
That said, the ARRAY statement is just there to define the mapping between the dataset variables and the shorthand array reference. It is useless to put it where you placed it as it does not initialize anything.
Another thing you should consider is that you need to accumulate the values and write a single row. You cannot do that on the FIRST row as you have not read all the data yet. You have to use the LAST.
A third issue is that the new variables you are defining must preserve the value when you read another row. SAS typically resets all variables that are not in the input dataset to MISSING. To avoid this and accumulate sums, min, max or other values on a group of rows, you must use the RETAIN statement.
Last... you may want to drop any variables you have just used to do your calculations, but are not desired in the ouput dataset... "i" in this case.
I think this should get you what you want:
DATA want (DROP=i);
SET have;
BY acct seq1 seq2;
/* define variables */
ARRAY la_array {7} la1-la7;
ARRAY ln_array {7} ln1-ln7;
RETAIN i la1-la7 ln1-ln7;
IF FIRST.seq2 then do;
/* initialize group variables */
DO i = 1 to 7;
la_array(i)=.;
ln_array(i)=.;
END;
/* initialize index variable */
i = 1
END;
/* read input row values into array variables */
la_array(i)=la;
ln_array(i)=ln;
i = i + 1;
/* write output row if group is finished */
IF LAST.seq2;
RUN;
Unfortunately I do not have a SAS installation available at the moment... you'll have to try it and let me know how it goes.
Upvotes: 1
Reputation: 7602
I'd go down a different route and take advantage of a little used feature in PROC SUMMARY (or MEANS) to achieve this. The output table is sorted by ACCT as this is the first variable in the CLASS statement, obviously you can easily resort by SEQ1 order afterwards.
The IDGROUP statement in conjunction with OUT[7] creates 7 variables for each of the variables listed (LA, LN) and populates them in sequence as the data is read. The AUTONAME option adds sequential numbers to the variable names (e.g. LN1, LN2,...,LN7). The only problem I've come across with this method is an out of memory error on the server where the source dataset is very large, this being due to PROC SUMMARY being run in memory. Hopefully it will work for you in this instance.
proc summary data=have nway;
class acct seq1 seq2;
output out=want (drop=_:)
idgroup(out[7] (la ln)=) / autoname;
run;
Upvotes: 5