Reputation: 239
I am trying to write a data step in SAS, for later use with proc rank, that creates six groups (the group variable) of eight subjects each (the subject variable) with a random number assigned to each subject (the cohort variable, which is used later in proc rank). This is pretty straightforward except I want to have my subjects numbered 1-48 while still being split into six groups (A, B, C, etc.). Just writing a nested do loop would be fine for having groups A, B, etc. each containing a subject 1 through subject 8, but I want to have subject A have 1-8, B have 9-16, and so on. Right now, I have the following code to do this:
data treatment;
do group = 'A', 'B', 'C', 'D', 'E', 'F';
if group = 'A' then do subject = 1 to 8;
cohort = ranuni(1234);
output;
end;
else if group = 'B' then do subject = 9 to 16;
cohort = ranuni(1234);
output;
end;
else if group = 'C' then do subject = 17 to 24;
cohort = ranuni(1234);
output;
end;
else if group = 'D' then do subject = 25 to 32;
cohort = ranuni(1234);
output;
end;
else if group = 'E' then do subject = 33 to 40;
cohort = ranuni(1234);
output;
end;
else if group = 'F' then do subject = 41 to 48;
cohort = ranuni(1234);
output;
end;
end;
run;
This does work, but it's a mess. Is there a way to have "subject" index from 1 to 8 for group A, then 9 to 16 for group B, and so on, WITHOUT having all the conditionals? I imagine there are other tools in SAS (macros? proc sql?) that would be much easier to work with, but I'm limited to do loops in the data step right now.
Disclaimer: This is for a homework assignment for a first-year SAS class. My code is working and does what I need it to do right now (and I'll submit it as-is if I can't figure anything else out), but I know it's extremely inefficient and I can't seem to find anything on how to get rid of all these if-else statements. (It is possible I just don't know what to search for--I've read several pages on using nested do loops, but nothing that would seem to help with my problem. Everything here seems to concern do loops in macros, and I'm not there yet.)
I do not want my code rewritten entirely--it's homework, I need to do it myself!--but I would appreciate any pointers in the right direction, even if they're just search terms. I'm completely stuck on what I'd need to look up to get this to work at this point.
Upvotes: 1
Views: 435
Reputation: 63424
There are, as you expect, a million and a half ways to solve this problem in SAS.
So I assume you want a dataset where you have
A 1
A 2
A 3
A 4
A 5
A 6
A 7
A 8
B 9
B 10
B 11
...
F 48
plus some random piece afterwards. The way I'd do that is to calculate the pieces separately.
You in effect have a single loop, which is 1 to 48, where the A-F grouping is effectively applied to the loop, right? So you should try to structure it this way:
data want;
set have;
do subject = 1 to 48;
group=<logic to determine group>;
cohort=<logic to determine cohort>;
output;
end;
run;
There are a few different ways to do <logic to determine group>
; the 'worst' way is a series of if statements, ie:
if subject le 8 then group='A';
else if subject le 16 then group='B';
...
else group='F';
There are several good options I could see for determining this in one single statement without conditional logic. If you want to figure this out for yourself, do so; if you want a hint or an explanation, comment such and I'm happy to explain how I'd do it, but I think it's better left unsaid for now (particularly as the exact method might depend on what you've learned to date).
A second option is to not use a loop for your subject at all, but a counter.
do class='A','B',...;
subjID+1;
cohort=...;
end;
That is basically how you would keep an 'external to the loop' counter; it's not a true programming loop itself, but it allows you to keep track of the ID. This is something you'll very commonly see used in other locations, and may be what your instructor was getting at. In your particular example I prefer the single loop 1:48 solution, as it avoids quite so much hardcoding of letters, but this is a common solution as well.
One side note: I strongly recommend not learning ranuni
and instead learning to use the rand
function. ranuni
is based on an inferior PRNG; rand
is strictly superior, and also has the bonus advantage that you don't have to keep uselessly repeating the seed (as the seed doesn't actually have any effect after the first call!). If your teacher has instructed you to use ranuni
, I suggest learning both and only including ranuni
in homework assignments that are submitted back to class. If your teacher is interested in learning why, Rick Wicklin has a good explanation here.
If you really like the double loop, there is a way to do this with two loops - but it requires the same basic concept that the above 1:48 loop does. (Don't read further if you want a completely spoiler free attempt at solving the first problem.) To read the spoiler, click 'improve' or 'edit' on this answer, as I hid it in angle braces (why doesn't SO have spoiler tags ...)
Upvotes: 3