Laura
Laura

Reputation: 45

Create new row to data set based existing ones SAS

I have a dataset looking something like this:

var1 var2   count 
cat1    no     1
cat1    yes    4
cat1    unkown 3
cat2    no     7
cat2    yes    3
cat2    unkown 5
cat3    no     2
cat3    yes    9
cat3    unkown 0

What I want to do is combine var1 & var2 into new variable where first row is from var1 and the others from var2. So it supposed to look like:

comb   count 
cat1    
no     1
yes    4
unkown 3
cat2
no     7    
yes    3
unkown 5
cat3    
no     2
yes    9
unkown 0

Any help would be highly appreciated!

Upvotes: 0

Views: 408

Answers (1)

Thogerar
Thogerar

Reputation: 339

It's quite simple.

Here the solution :

1) create the dataset source:

data testa;
 infile datalines dsd dlm=',';
   input var1 : $200. var2 : $200. count : 8. ;
   datalines;
cat1,no,1,
cat1,yes,4,
cat1,unkown,3,
cat2,no,7,
cat2,yes,3,
cat2,unkown,5,
cat3,no,2,
cat3,yes,9,
cat3,unkown,0,
; 
run;

2) Selection of var list : cat1|cat2|cat3

proc sql;
    select distinct(var1) into: list_var separated by '|' from testa;
run;

3) Process the var list one by one

%macro processListVar(list_var);

    data want;
    run;

    %let k=1;
    %do %while (%qscan(&list_var, &k,|) ne );
        %let var = %scan(&list_var, &k,|);


        data testb(drop=var1 rename=(var2=comb));
            set testa;
            N=_N_+1+&k;
            where var1="&var";
        run;

        data testc;
            N=1+&k;
            comb="&var";
            count=.;
        run;

        data tmp;
            set testb testc;
        run;

        proc sort data=tmp out=teste;
        by N;
        run;

        data want;
            set want teste;
        run;

        %put var=&var;
        %let k = %eval(&k + 1);
    %end;

%mend processListVar;

%processListVar(&list_var);

4) At the end you get the result in dataset want.

You have to exclude finaly the N column like that :

data want_cleaned (drop=N);
set want;
run;

5) More explanation on the code.

a. The key problem was to keep the order between cat1,cat2,cat3.

b. So I divided the problem by each dataset cat1, cat2, .. and created a %do %while to loop through categories.

c. We use the column N, to count the number of line (like an index), and then we can sort on this column, to keep the order.

d. For example : the first var cat1 : We select the column var2, we rename it like the comb column. We drop the var1 column. It create the testb dataset. The testb dataset is used to create an index (column N) and we create the first line of our subdataset (N=1+&k) in testc. &k is used through all subdatasets. Like that the index is continuing between subdatasets. (without interfering each others). We make a merge between testb and testc. The dataset tmp contains all info needed for cat1. Then we merge all subdatasets in dataset want.

So to summary, we create a loop, and we merge the datasets together at the end. We make a sort on the column N, to display lines in the order you wanted.

Regards,

Upvotes: 1

Related Questions