Mompolooo
Mompolooo

Reputation: 5

ARRAY function in SAS after MEAN (by grouping)

There is some homework for SAS and I just can't seem to find the right way to do it. Hopefully, some of you will be able to help.

We start with a table where we have the following variables:

City     State     Temp January     Temp Feb     Temp Mar    ...    Temp Dec

First, we have to calculate the mean temperature (per month, so for 12 different variables) and per state (so there are always a few cities per state).

I used this code:

PROC SORT DATA=Homework;
BY state;
RUN;
PROC MEANS DATA=Homework;   
    VAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC;
    BY State
    OUTPUT OUT=MTSM (DROP=_type_ _freq_) MEAN=;
    RUN; 

My result is a table in which I have 53 rows (one per state) and 1 column per month (and a first column for the states of cours). Something like this:

State     JAN       FEB     ...     DEC
State1    xjan      xfeb    ...     xdec
State2                   
...                   
State53                   

Now I need to use an Array statement to make a new table in long format:

State     Month      Mean_temp
State1    JAN        xjan
state1    FEB        xfeb
.         MAR        ...
.         APR        ...
.         ...        ... 
State1    DEC        xdec
State 2   JAN        ...
          ...
          DEC
...
State53   JAN
          FEB
          ...

Does someone have an idea of how to do this? I'm completely lost. This is what I tried:

DATA MTSM2;
    SET MTSM;
    BY state;
    ARRAY newvars {1} Mean_Temp;
    ARRAY oldvars {1, 12} JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC;
    DO Month = JAN to DEC;
    DO k=1;
        newvars{k} = oldvars{k, Month};
    END;
    OUTPUT;
    END;
    KEEP state Month Mean_Temp;
RUN;

I got following error: ERROR: Array subscript out of range at line 30 column 22 :'(

What am I doing wrong? I have been changing this in many ways, but always get the same error.

Thanks in advance!

Upvotes: 0

Views: 517

Answers (4)

Richard
Richard

Reputation: 27508

If the homework is to pivot the data, using ARRAY, from the categorically organized layout (state/month/mean) to a wide layout (state/month-1...month-12) you can use BY processing and index determination to fill an array.

Essentially for each BY group there will be one row output.

One way is to use a DOW loop in which the SET statement is inside an explicit loop.

data want(keep=state jan--dec);
  do until (last.state);
    set have;
    by state;
    array months jan feb mar apr may jun jul aug sep oct nov dec;
    index = (index('JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC', trim(month))+2)/3;
    months(index) = mean;
  end;
run;

If the data is known to have every month, the index 'lookup' is not needed and can be retrieved directly from the do loop index variable:

data want(keep=state jan--dec);
  do _n_ = 1 by 1 until (last.state);   /* repurpose _n_ */
    set have;
    by state;
    array months jan feb mar apr may jun jul aug sep oct nov dec;
    months(_n_) = mean;
  end;
run;

Update

Using array to pivot data from an across layout to a down layout. Iterate a loop over the array elements and output name/value pairs within the loop.

data want (keep=state month percent);
  set have;
  array months jan feb mar apr may jun jul aug sep oct nov dec;

  do _n_ = 1 to dim(months);
    month = vname(months(_n_));     /* name */
    percent = months(_n_);          /* value */
    OUTPUT;
  end;
run;

Proc TRANSPOSE can perform the same data transformation.

Array based pivoting is very useful when you want to transpose two or more arrays at the same time. An example would be if you had variables

  • jan_percent to dec_percent and
  • jan_rating to dec_rating

that you wanted to pivot into a data form of month/percent/rating. Such a transformation with TRANSPOSE requires multiple proc steps (one per array).

Upvotes: 1

Tom
Tom

Reputation: 51566

You are very close.

There is no need to use ARRAY for the new variable since it is just one. There is no need to tell SAS how many variables there are in the array when you have listed the actual variable names. And arrays are indexed by integers, not strings. You can use the VNAME() function to find the name of the variable addressed by the index into the array. The BY statement is not needed.

DATA MTSM2;
    SET MTSM;
    ARRAY oldvars JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC;
    length month $32 mean_temp 8;
    DO month_number = 1 to 12 ;
        month=vname( oldvars[month_number] );
        mean_temp = oldvars[month_number] ;
        OUTPUT;
    END;
    KEEP state Month Mean_Temp;
RUN;

Upvotes: 1

user667489
user667489

Reputation: 9569

You can get the table you want by using a more specific output statement in proc means/proc summary:

/*Generate some dummy data*/
data have;
call streaminit(1);
do j = 1 to 10;
  do state = 'a', 'b', 'c';
    array months[12] m1-m12;
    do i = 1 to dim(months);
      months[i] = rand('uniform');
    end;
    output;
  end;
end;
drop i j;
run;

proc summary nway data = have;
 var m1-m12;
 class state;
 output out = want(drop = _TYPE_ _FREQ_) mean=;
run;

Upvotes: 2

PeterClemmensen
PeterClemmensen

Reputation: 4937

Sounds like you simply want to use a Class Statement instead of a By Statement?

Upvotes: 0

Related Questions