Reputation: 525
I have a dataset of money earned as a % every week in 2017 to 2018. Some don't have data at the start of 2017 as they didn't start earning until later on. The weeks are numbered as 201701, 201702 - 201752 and 201801 - 201852.
What I'd like to do is have 104 new variables called WEEK0 - WEEK103, where WEEK0 will have the first non empty column value of the money earned columns. Here is an example of the data:
MON_EARN_201701 MON_EARN_201702 MON_EARN_201703 MON_EARN_201704
30 21 50 65
. . 30 100
. 102 95 85
Then I want my data to have the following columns (example)
WEEK0 WEEK1 WEEK2 WEEK3
30 21 50 65
30 100 . .
102 95 85 .
These are just small examples of a very large dataset.
I was thinking I'd need to try and do some sort of do loops so what I've tried so far is:
DATA want;
SET have;
ARRAY mon_earn{104} mon_earn_201701 - mon_earn_201752 mon_earn_201801 -mon_earn_201852;
ARRAY WEEK {104} WEEK0 - WEEK103;
DO i = 1 to 104;
IF mon_earn{i} NE . THEN;
WEEK{i} = mon_earn{i};
END;
END;
RUN;
This doesn't work as it doesn't fill the WEEK0 when the first value is empty.
If anymore information is needed please comment and I will add it in.
Upvotes: 0
Views: 1822
Reputation: 51621
Sounds like you just need to find the starting point for copying.
First look thru the list of earnings by calendar month until you find the first non missing value. Then copy the values starting from there into you new array of earnings by relative month.
data want;
set have;
array mon_earn mon_earn_201701 -- mon_earn_201852;
array week (104);
do i = 1 to dim(mon_earn) until(found);
if mon_earn{i} ne . then found=1;
end;
do j=1 to dim(week) while (i+j<dim(mon_earn));
week(j) = mon_earn(i+j-1);
end;
run;
NOTE: I simplified the ARRAY definitions. For the input array I assumed that the variables are defined in order so that you could use positional array list. For the WEEK array SAS and I both like to start counting from one, not zero.
Upvotes: 1
Reputation: 27526
You need a second index variable, call it j
, to target the proper week assignment. j
is only incremented when a months earning is not missing.
This example code will 'squeeze out` all missing earnings; even those missing earnings that occurring after some earning has occurred. For example
earning: . . . 10 . 120 . 25 …
will squeeze to
week: 10 120 25 …
data have;
array earn earn_201701-earn_201752 earn_201801-earn_201852;
do _n_ = 1 to 1000;
call missing (of earn(*));
do _i_ = 1 + 25 * ranuni(123) to dim(earn);
if ranuni(123) < 0.95 then
earn(_i_) = round(10 + 125 * ranuni(123));
end;
output;
end;
run;
data want;
set have;
array earn earn_201701-earn_201752 earn_201801-earn_201852;
array week(0:103);
j = -1;
do i = 1 to dim(earn);
if not missing(earn(i)) then do;
j+1;
week(j) = earn(i);
end;
end;
drop i j;
run;
If you want to maintain interior missing earnings the logic would be
if not missing(earn(i)) or j >=0 then do;
j+1;
week(j) = earn(i);
end;
Upvotes: 1
Reputation: 21294
You could do this if it was a long format. There's a chance you don't need it while in a long format.
proc sort data=have;
by ID week;
run;
data want;
set have;
by id; *for each group/id counter;
retain counter;
if first.id then counter=0;
if counter=0 and not missing(value) then do;
counter=1; new_week=0; end;
if counter = 1 then new_week+1;
run;
If you really need it wide:
Assign week to mon_earned from i to end of week.
data want;
set have;
array mon_earned(*) .... ;
array week(*) ... ;
found=0; i=0;
do while(found=0);
if not missing(mon_earned(i)) then found=1;
i+1;
end;
z=0;
do j=i to dim(week);
week(z) = mon_earned(j);
z+1;
end;
run;
Upvotes: 1