Remy M
Remy M

Reputation: 619

SAS Using Data Set to Create Other Data Sets

I am supposed to create a summary data set containing the mean, median, and standard deviation broken down by gender and group (using the CLASS statement). Using this summary data set, create four other data sets (in one DATA step) as follows:

(1) grand mean (2) stats broken down by gender (3) stats broken down by group (4) stats broken down by gender and group

Given the hint to use the CHARTYPE option.

I provided my attempted solution, but I don't think I did it in the way asked.

DATA CLINICAL;
   *Use LENGTH statement to control the order of
    variables in the data set;
   LENGTH PATIENT VISIT DATE_VISIT 8;
   RETAIN DATE_VISIT WEIGHT;
   DO PATIENT = 1 TO 25;
      IF RANUNI(135) LT .5 THEN GENDER = 'Female';
      ELSE GENDER = 'Male';
      X = RANUNI(135);
      IF X LT .33 THEN GROUP = 'A';
      ELSE IF X LT .66 THEN GROUP = 'B';
      ELSE GROUP = 'C';
      DO VISIT = 1 TO INT(RANUNI(135)*5);
         IF VISIT = 1 THEN DO;
             DATE_VISIT = INT(RANUNI(135)*100) + 15800;
             WEIGHT = INT(RANNOR(135)*10 + 150);
         END;
         ELSE DO;
            DATE_VISIT = DATE_VISIT + VISIT*(10 + INT(RANUNI(135)*50));
            WEIGHT = WEIGHT + INT(RANNOR(135)*10);
         END;
         OUTPUT;
         IF RANUNI(135) LT .2 THEN LEAVE;
      END;
   END;
   DROP X;
   FORMAT DATE_VISIT DATE9.;
RUN;
PROC MEANS DATA=CLINICAL;
CLASS GENDER GROUP;
OUTPUT OUT=SUMMARY
       MEAN=
       MEDIAN=
       STDDEV= / AUTONAME;
RUN;

Upvotes: 1

Views: 107

Answers (1)

Joe
Joe

Reputation: 63424

No, what they're asking you to do is:

  1. Use the OUTPUT statement in PROC MEANS to create a summary dataset. Choose the appropriate TYPES and CLASS values in PROC MEANS such that all four sets of data are represented on the output.
  2. Using a single data step that has four dataset names on the data statement, selectively output those rows to the correct dataset. You would use the _TYPE_ variable to determine which dataset a row would be output to.

CHARTYPES just means your _TYPE_ variable will look like 1001 instead of 9 (the binary representation, basically). 1001 indicates which class variable is used (the first and the fourth) to create that breakout. (With only two class variables, you would have values 00, 01, 10, 11 possible). This is sometimes easier for non-programmers who aren't used to thinking in binary (these values would be 0, 1, 2, and 3 in decimal without CHARTYPES and thus might be more difficult for you to tell which corresponds to which variable).

Upvotes: 1

Related Questions