Reputation: 13
I have two tables and need to create one more table working with other two:
first_table: SECOND TABLE
id term id term majr_code
3 2014 3 2010 ACT
3 2015 3 2010 ACT
4 2014 3 2011 GNST
4 2015 3 2015 BUSA
5 2013 3 2015 BUSA
5 2014 4 2009 TIM
6 2013 4 2010 BAL
6 2014 4 2014 TAR
5 2011 SAR
5 2013 COR
6 2010 PAT
6 2013 TOR
This is two tables I have. I need to create another table which is same with first table and adding one more column majr_code.
first_table:
id term majr_code
3 2014 GNST
3 2015 BUSA
4 2014 TAR
4 2015 TAR
5 2013 COR
5 2014 COR
6 2013 TOR
6 2014 TOR
what I need to do is, for the same id if second table has the same term with first table, I will keep same majr_code. For example: For first table has 2014 and second table has 2011 and 2015, I need to use 2011's majr_Code for 2014 term. For example: first table has 2013 and 2014 terms for the same id, and if second table's highest term is 2013, I will keep same majr_Code for 2013 and 2014
I know its complicated, it should be more clear if you check the tables and result. If still complicated, I can delete the question. This is how I can explain. Thanks!
Upvotes: 0
Views: 43
Reputation: 58
I think the below code should do the trick. It works as follows:
1) reads in the sample datasets.
2) Create a table titled second_table_nogaps
which is just the second_table
but with no yearly gaps up through 2015. Basically, for each ID in the second table, it checks if a given yearly record exists. If so, the record is output, if not, it creates a new record with the prior year's majr_code. If the last record for a given id is not 2015, then new records are generated up through 2015. (for example a new record is created for id=4, year=2014, majr_code = TAR)
3) Merged the unique values of id+term+majr_code to first_table. The resulting table First_table_2
should be what you're looking for! However, BE CAREFUL, if there are multiple majr_codes for the same id+term this step will result in duplication.
Hope this helps! The code in step 2 could probably be simplified as my handling of the first and last record was not particularly efficient.
data first_table;
infile datalines ;
input id term;
datalines ;
3 2014
3 2015
4 2014
4 2015
5 2013
5 2014
6 2013
6 2014
;
run;
data second_table;
infile datalines ;
input id term majr_code $;
datalines ;
3 2010 ACT
3 2010 ACT
3 2011 GNST
3 2015 BUSA
3 2015 BUSA
4 2009 TIM
4 2010 BAL
4 2014 TAR
5 2011 SAR
5 2013 COR
6 2010 PAT
6 2013 TOR
;
run;
proc sort data=second_table ; by id term; run;
data second_table_nogaps (keep=id_nogaps term_nogaps majr_code_nogaps );
set second_table end=eof;
retain id_nogaps term_nogaps majr_code_nogaps ;
*first set up the first row... establishes retained variables and outputs;
if _N_ = 1 then do;
id_nogaps = id ;
term_nogaps = term;
majr_code_nogaps = majr_code;
output;
end;
*for all but the first and last row;
else if not eof then do;
do while ( (term_nogaps + 1 < term ) /*this is to fill in gaps between years. (e.g. major code in 2011 and major code in 2014 within the same id*/
or
((id_nogaps ne id) and term_nogaps < 2015) /*this is to fill major code for all terms up through 2015 (e.g. last major code for id 4 is in 2014)*/
);
term_nogaps = term_nogaps + 1;
output;
end;
id_nogaps=id;
term_nogaps = term;
majr_code_nogaps=majr_code;
output;
end;
else do;
do while (term_nogaps + 1 < term );
term_nogaps = term_nogaps + 1;
output;
end;
id_nogaps=id;
term_nogaps = term;
majr_code_nogaps=majr_code;
output;
do while ( term_nogaps < 2015 );
term_nogaps = term_nogaps + 1;
output;
end;
end;
run;
proc sql;
create table First_table_2 as
Select a.* , b.majr_code_nogaps as majr_code
from first_table a
left join
(select distinct id_nogaps, term_nogaps, majr_code_nogaps from second_table_nogaps) b /*select distinct values to prevent duplication*/
on a.id = b.id_nogaps and a.term = b.term_nogaps;
quit;
Upvotes: 1
Reputation: 63424
There are a few approaches to this, but sql
is probably easiest. You don't provide code, so i'll just include a pointer. You need to use having
to filter the table after it's been grouped to having term=max(term)
.
Upvotes: 0