Reputation: 3
Here is a simple example I came up with. There are 3 players here (id is 1,2,3) and each player gets 3 attempts at the game (attempt is 1,2,3).
data have;
infile datalines delimiter=",";
input id attempt score;
datalines;
1,1,100
1,2,200
2,1,150
3,1,60
;
run;
I would like to add in rows where the score is missing if they did not play attempt 2 or attempt 3.
data want;
set have;
by id attempt;
* ??? ;
run;
proc print data=have;
run;
The output would look something like this.
1 1 100
1 2 200
1 3 .
2 1 150
2 2 .
2 3 .
3 1 60
3 2 .
3 3 .
How do I go about doing this?
Upvotes: 0
Views: 189
Reputation: 51566
If the absent observations are only at the end then you can just use a couple of OUTPUT statements and a DO loop. So write each observation as it is read and if the last one is NOT attempt 3 then add more observations until you get to attempt 3.
data want1;
set have ;
by id;
output;
score=.;
if last.id then do attempt=attempt+1 to 3;
output;
end;
run;
If the absent attempts can appear any where then you need to "look ahead" to see whether the next observations skips any attempts.
data want2;
set have end=eof;
by id ;
if not eof then set have (firstobs=2 keep=attempt rename=(attempt=next));
if last.id then next=3+1;
output;
score=.;
do attempt=attempt+1 to next-1;
output;
end;
drop next;
run;
Upvotes: 0
Reputation: 27498
Here is a alternative DATA
step, a DOW way:
data want;
do until (last.id);
set have;
by id;
output;
end;
call missing(score);
do attempt = attempt+1 to 3;
output;
end;
run;
Upvotes: 0
Reputation: 3
Update: I figured it out.
proc sort data=have;
by id attempt;
data want;
set have (rename=(attempt=orig_attempt score=orig_score));
by id;
** Previous attempt number **;
retain prev;
if first.id then prev = 0;
** If there is a gap between previous attempt and current attempt, output a blank record for each intervening attempt **;
if orig_attempt > prev + 1 then do attempt = prev + 1 to orig_attempt - 1;
score = .;
output;
end;
** Output current attempt **;
attempt = orig_attempt;
score = orig_score;
output;
** If this is the last record and there are more attempts that should be included, output dummy records for them **;
** (Assumes that you know the maximum number of attempts) **;
if last.id & attempt < 3 then do attempt = attempt + 1 to 3;
score = .;
output;
end;
** Update last attempt used in this iteration **;
prev = attempt;
run;
Upvotes: 0
Reputation: 27498
A table of possible attempts cross joined with the distinct ids left joined to the data will produce the desired result set.
Example:
data have;
infile datalines delimiter=",";
input id attempt score;
datalines;
1,1,100
1,2,200
2,1,150
3,1,60
;
data attempts;
do attempt = 1 to 3; output; end;
run;
proc sql;
create table want as
select
each_id.id,
each_attempt.attempt,
have.score
from
(select distinct id from have) each_id
cross join
attempts each_attempt
left join
have
on
each_id.id = have.id
& each_attempt.attempt = have.attempt
order by
id, attempt
;
Upvotes: 1
Reputation: 61
You could solve this by first creating a table where you have the structure you want to see: for each ID three attempts. This structure can then be joined with a 'left join' to your 'have' table to get the actual scores if they exist and missing variable if they don't.
/* Create table with all ids for which the structure needs to be created */
proc sql;
create table ids as
select distinct id from have;
quit;
/* Create table structure with 3 attempts per ID */
data ids (drop = i);
set ids;
do i = 1 to 3;
attempt = i;
output;
end;
run;
/* Join the table structure to the actual scores in the have table */
proc sql;
create table want as
select a.*,
b.score
from ids a left join have b on a.id = b.id and a.attempt = b.attempt;
quit;
Upvotes: 1