Reputation: 5
There is some homework for SAS and I just can't seem to find the right way to do it. Hopefully, some of you will be able to help.
We start with a table where we have the following variables:
City State Temp January Temp Feb Temp Mar ... Temp Dec
First, we have to calculate the mean temperature (per month, so for 12 different variables) and per state (so there are always a few cities per state).
I used this code:
PROC SORT DATA=Homework;
BY state;
RUN;
PROC MEANS DATA=Homework;
VAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC;
BY State
OUTPUT OUT=MTSM (DROP=_type_ _freq_) MEAN=;
RUN;
My result is a table in which I have 53 rows (one per state) and 1 column per month (and a first column for the states of cours). Something like this:
State JAN FEB ... DEC
State1 xjan xfeb ... xdec
State2
...
State53
Now I need to use an Array statement
to make a new table in long format:
State Month Mean_temp
State1 JAN xjan
state1 FEB xfeb
. MAR ...
. APR ...
. ... ...
State1 DEC xdec
State 2 JAN ...
...
DEC
...
State53 JAN
FEB
...
Does someone have an idea of how to do this? I'm completely lost. This is what I tried:
DATA MTSM2;
SET MTSM;
BY state;
ARRAY newvars {1} Mean_Temp;
ARRAY oldvars {1, 12} JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC;
DO Month = JAN to DEC;
DO k=1;
newvars{k} = oldvars{k, Month};
END;
OUTPUT;
END;
KEEP state Month Mean_Temp;
RUN;
I got following error: ERROR: Array subscript out of range at line 30 column 22
:'(
What am I doing wrong? I have been changing this in many ways, but always get the same error.
Thanks in advance!
Upvotes: 0
Views: 517
Reputation: 27508
If the homework is to pivot the data, using ARRAY
, from the categorically organized layout (state
/month
/mean
) to a wide layout (state
/month-1
...month-12
) you can use BY
processing and index
determination to fill an array.
Essentially for each BY
group there will be one row output.
One way is to use a DOW loop in which the SET
statement is inside an explicit loop.
data want(keep=state jan--dec);
do until (last.state);
set have;
by state;
array months jan feb mar apr may jun jul aug sep oct nov dec;
index = (index('JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC', trim(month))+2)/3;
months(index) = mean;
end;
run;
If the data is known to have every month, the index 'lookup' is not needed and can be retrieved directly from the do loop index variable:
data want(keep=state jan--dec);
do _n_ = 1 by 1 until (last.state); /* repurpose _n_ */
set have;
by state;
array months jan feb mar apr may jun jul aug sep oct nov dec;
months(_n_) = mean;
end;
run;
Update
Using array
to pivot data from an across layout to a down layout. Iterate a loop over the array elements and output
name/value pairs within the loop.
data want (keep=state month percent);
set have;
array months jan feb mar apr may jun jul aug sep oct nov dec;
do _n_ = 1 to dim(months);
month = vname(months(_n_)); /* name */
percent = months(_n_); /* value */
OUTPUT;
end;
run;
Proc TRANSPOSE
can perform the same data transformation.
Array based pivoting is very useful when you want to transpose two or more arrays at the same time. An example would be if you had variables
jan_percent
to dec_percent
andjan_rating
to dec_rating
that you wanted to pivot into a data form of month
/percent
/rating
. Such a transformation with TRANSPOSE
requires multiple proc steps (one per array).
Upvotes: 1
Reputation: 51566
You are very close.
There is no need to use ARRAY for the new variable since it is just one. There is no need to tell SAS how many variables there are in the array when you have listed the actual variable names. And arrays are indexed by integers, not strings. You can use the VNAME()
function to find the name of the variable addressed by the index into the array. The BY statement is not needed.
DATA MTSM2;
SET MTSM;
ARRAY oldvars JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC;
length month $32 mean_temp 8;
DO month_number = 1 to 12 ;
month=vname( oldvars[month_number] );
mean_temp = oldvars[month_number] ;
OUTPUT;
END;
KEEP state Month Mean_Temp;
RUN;
Upvotes: 1
Reputation: 9569
You can get the table you want by using a more specific output
statement in proc means
/proc summary
:
/*Generate some dummy data*/
data have;
call streaminit(1);
do j = 1 to 10;
do state = 'a', 'b', 'c';
array months[12] m1-m12;
do i = 1 to dim(months);
months[i] = rand('uniform');
end;
output;
end;
end;
drop i j;
run;
proc summary nway data = have;
var m1-m12;
class state;
output out = want(drop = _TYPE_ _FREQ_) mean=;
run;
Upvotes: 2
Reputation: 4937
Sounds like you simply want to use a Class Statement instead of a By Statement?
Upvotes: 0