user3749243
user3749243

Reputation: 13

sas coding: choosing max variable

I have two tables and need to create one more table working with other two:

  first_table:                      SECOND TABLE
id     term                      id           term      majr_code
3      2014                       3           2010     ACT
3      2015                       3           2010     ACT
4      2014                       3           2011     GNST
4      2015                       3           2015     BUSA
5      2013                       3           2015     BUSA
5      2014                       4           2009     TIM  
6      2013                       4           2010     BAL
6      2014                       4           2014     TAR
                                  5           2011     SAR
                                  5           2013    COR
                                  6           2010     PAT
                                  6           2013     TOR

This is two tables I have. I need to create another table which is same with first table and adding one more column majr_code.

    first_table:                      
id     term      majr_code               
3      2014       GNST              
3      2015       BUSA                
4      2014       TAR              
4      2015       TAR                
5      2013       COR
5      2014       COR          
6      2013       TOR              
6      2014       TOR

what I need to do is, for the same id if second table has the same term with first table, I will keep same majr_code. For example: For first table has 2014 and second table has 2011 and 2015, I need to use 2011's majr_Code for 2014 term. For example: first table has 2013 and 2014 terms for the same id, and if second table's highest term is 2013, I will keep same majr_Code for 2013 and 2014

I know its complicated, it should be more clear if you check the tables and result. If still complicated, I can delete the question. This is how I can explain. Thanks!

Upvotes: 0

Views: 43

Answers (2)

Acie
Acie

Reputation: 58

I think the below code should do the trick. It works as follows:

1) reads in the sample datasets.

2) Create a table titled second_table_nogaps which is just the second_table but with no yearly gaps up through 2015. Basically, for each ID in the second table, it checks if a given yearly record exists. If so, the record is output, if not, it creates a new record with the prior year's majr_code. If the last record for a given id is not 2015, then new records are generated up through 2015. (for example a new record is created for id=4, year=2014, majr_code = TAR)

3) Merged the unique values of id+term+majr_code to first_table. The resulting table First_table_2 should be what you're looking for! However, BE CAREFUL, if there are multiple majr_codes for the same id+term this step will result in duplication.

Hope this helps! The code in step 2 could probably be simplified as my handling of the first and last record was not particularly efficient.

  data first_table;
        infile datalines ;
        input id term;
        datalines ;
        3      2014 
        3      2015 
        4      2014 
        4      2015 
        5      2013 
        5      2014 
        6      2013 
        6      2014
        ;
    run;


data second_table;
    infile datalines ;
    input id term majr_code $;
    datalines ;
        3   2010    ACT
        3   2010    ACT
        3   2011    GNST
        3   2015    BUSA
        3   2015    BUSA
        4   2009    TIM
        4   2010    BAL
        4   2014    TAR
        5   2011    SAR
        5   2013    COR
        6   2010    PAT
        6   2013    TOR
    ;
run;

proc sort data=second_table ; by id term; run;

data second_table_nogaps (keep=id_nogaps term_nogaps majr_code_nogaps );
    set second_table end=eof;
    retain id_nogaps term_nogaps majr_code_nogaps ;

    *first set up the first row... establishes retained variables and outputs;
    if _N_ = 1 then do;
                id_nogaps = id ; 
                term_nogaps = term;
                majr_code_nogaps = majr_code;
                output;
        end;

        *for all but the first and last row;
        else if not eof then do;
            do while (  (term_nogaps + 1 < term ) /*this is to fill in gaps between years. (e.g. major code in 2011 and major code in 2014 within the same id*/
                        or 
                        ((id_nogaps ne id) and term_nogaps < 2015) /*this is to fill major code for all terms up through 2015 (e.g. last major code for id 4 is in 2014)*/
                      );
                term_nogaps = term_nogaps + 1;
                output;
            end;

            id_nogaps=id;
            term_nogaps = term;
            majr_code_nogaps=majr_code;
            output;
        end;

        else do;
            do while (term_nogaps + 1 < term );
                term_nogaps = term_nogaps + 1;
                output;
            end;
            id_nogaps=id;
            term_nogaps = term;
            majr_code_nogaps=majr_code;
            output;
            do while ( term_nogaps < 2015 );
                term_nogaps = term_nogaps + 1;
                output;
            end;
    end;
run;

proc sql;
    create table First_table_2 as 
    Select a.* , b.majr_code_nogaps as majr_code
    from first_table a
        left join 
            (select distinct id_nogaps, term_nogaps, majr_code_nogaps from second_table_nogaps) b /*select distinct values to prevent duplication*/
    on a.id   =   b.id_nogaps  and a.term = b.term_nogaps;
quit;

Upvotes: 1

Joe
Joe

Reputation: 63424

There are a few approaches to this, but sql is probably easiest. You don't provide code, so i'll just include a pointer. You need to use having to filter the table after it's been grouped to having term=max(term).

Upvotes: 0

Related Questions