Reputation: 43
i think i know all of the parts on how to do this problem i have just had issues putting them together (outside of one small issue).
I have 2 data sets. one called "old_id" one called "new_id". The old "ID" is 7 characters (9 including hyphens... example : EX-123-45). The new ID format is 9 characters (11 including hypens... example : 88EX-123-45). there are some old_ids that were converted to new_ids but had an 88 or a 55 added in front of them. All other new ids have 22 in front of them and i need to avoid these because they have duplicate id's as the "old_id" but with 22 in front of them.
what i want to do with these is be able to match the info for the "old_ids" to the "new_ids" that are the new id's for the old id's (lol).
i know i can use
data ID_5;
set new_id;
ID5 = substr(right(id),2);
run;
this takes out the leading numbers and i could match on this. with the two files but then i have no way of getting the leading variables back (as far as i know). could i write an if statement to match IF the last 9 characters in "ID" in new_id match the old_id. IF the leading characters for ID in new_id = 88 or 55.
example of what the data i have looks like and what i want the end result
data for old_id.. this also contains 22 overrides (only about 20 though)
location ProductID destination
1 EX-123-45 724
1 EX-12A-B3 724 /* not all old ids have an override*/
5 EX-ABC-12 515
5 EX-ABC-14 953
7 22EX-ZYX-12 007
7 22EX-LMN-12 056
new_id file
location ID destination
1 22EX-123-45 894
1 55EX-123-45 785
5 22EX-987-45 999
5 88EX-ABC-14 881
7 22EX-ZYX-12 678
7 22EX-LMN-12 254
what i want it to look like
location ProductID destination
1 22EX-123-45 894
1 55EX-123-45 724 /*old_id overrides destination*/
5 22EX-987-45 999
5 88EX-ABC-14 953 /*old_id overrides destination*/
7 22EX-ZYX-12 007 /*overwrites new_id destination*/
7 22EX-LMN-12 056 /*overwrites new_id destination*/
the majority of the 22's do not have overrides
Upvotes: 0
Views: 477
Reputation: 51566
First let's convert your listings into actual datasets we can code with.
data old_id ;
input location ProductID :$11. destination $ ;
cards;
1 EX-123-45 724
1 EX-12A-B3 724 /* not all old ids have an override*/
5 EX-ABC-12 515
5 EX-ABC-14 953
7 22EX-ZYX-12 007
7 22EX-LMN-12 056
;
data new_id;
input location ID :$11. destination $ ;
cards;
1 22EX-123-45 894
1 55EX-123-45 785
5 22EX-987-45 999
5 88EX-ABC-14 881
7 22EX-ZYX-12 678
7 22EX-LMN-12 254
;
Now just join them and use COALESCE() function to pick the right value. Your join criteria needs to handle the values that start with 22
differently.
proc sql ;
create table want as
select a.location
, a.id as productid
, coalesce(b.destination,a.destination) as destination
from new_id a
left join old_id b
on (b.productid = substr(a.id,3) and a.id not eqt '22')
or (b.productid = a.id and a.id eqt '22')
order by 1,2
;
quit;
Upvotes: 0
Reputation: 1297
Keeping the original new_id can be solved by just creating a new variable to merge on.
data true_new new;
set new_id;
length match_id $9;
if substr(id, 1,2) = '22' then call missing(match_id);
else if substr(id,1,2) in ('88', '55') then match_id = substr(id,3);
else match_id = id;
if missing(match_id) then output true_new;
else output new;
run;
I would now match the new
with old_id
based on the match_id
field and then append true_new
to the resulting dataset. These can be done in one step, but I find this a bit more cleaner.
Upvotes: 0
Reputation: 51566
So if you can really tell which type of id is in the variable ID
by the length of the value then use that to decide whether or not you need to remove two characters.
if length(id)=9 then old_id=id;
else old_id=substr(id,3);
If the hyphens are also not consistently inserted you might want to also remove them.
if length(compress(id,'-'))=7 then old_id=compress(id,'-');
else old_id=compress(substr(id,3),'-');
Upvotes: 1