BluesGotTheCup
BluesGotTheCup

Reputation: 43

SAS substr merge

i think i know all of the parts on how to do this problem i have just had issues putting them together (outside of one small issue).

I have 2 data sets. one called "old_id" one called "new_id". The old "ID" is 7 characters (9 including hyphens... example : EX-123-45). The new ID format is 9 characters (11 including hypens... example : 88EX-123-45). there are some old_ids that were converted to new_ids but had an 88 or a 55 added in front of them. All other new ids have 22 in front of them and i need to avoid these because they have duplicate id's as the "old_id" but with 22 in front of them.

what i want to do with these is be able to match the info for the "old_ids" to the "new_ids" that are the new id's for the old id's (lol).

i know i can use

data ID_5;
set new_id;
ID5 = substr(right(id),2);
run;

this takes out the leading numbers and i could match on this. with the two files but then i have no way of getting the leading variables back (as far as i know). could i write an if statement to match IF the last 9 characters in "ID" in new_id match the old_id. IF the leading characters for ID in new_id = 88 or 55.

example of what the data i have looks like and what i want the end result

data for old_id.. this also contains 22 overrides (only about 20 though)

location ProductID    destination 
1        EX-123-45    724
1        EX-12A-B3    724      /* not all old ids have an override*/
5        EX-ABC-12    515
5        EX-ABC-14    953
7        22EX-ZYX-12  007
7        22EX-LMN-12  056

new_id file

location ID          destination
1        22EX-123-45  894
1        55EX-123-45  785
5        22EX-987-45  999
5        88EX-ABC-14  881
7        22EX-ZYX-12  678
7        22EX-LMN-12  254

what i want it to look like

location ProductID  destination 
1        22EX-123-45  894
1        55EX-123-45  724  /*old_id overrides destination*/
5        22EX-987-45  999
5        88EX-ABC-14  953  /*old_id overrides destination*/
7        22EX-ZYX-12  007  /*overwrites new_id destination*/
7        22EX-LMN-12  056  /*overwrites new_id destination*/

the majority of the 22's do not have overrides

Upvotes: 0

Views: 477

Answers (3)

Tom
Tom

Reputation: 51566

First let's convert your listings into actual datasets we can code with.

data old_id ;
  input location ProductID :$11.  destination $ ;
cards;
1  EX-123-45  724
1  EX-12A-B3  724      /* not all old ids have an override*/
5  EX-ABC-12  515
5  EX-ABC-14  953
7  22EX-ZYX-12  007
7  22EX-LMN-12  056
;
data new_id;
  input location ID :$11. destination $ ;
cards;
1  22EX-123-45 894
1  55EX-123-45 785
5  22EX-987-45 999
5  88EX-ABC-14 881
7  22EX-ZYX-12  678
7  22EX-LMN-12  254
;

Now just join them and use COALESCE() function to pick the right value. Your join criteria needs to handle the values that start with 22 differently.

proc sql ;
 create table want as
   select a.location
        , a.id as productid
        , coalesce(b.destination,a.destination) as destination
   from new_id a
   left join old_id b
   on (b.productid = substr(a.id,3) and a.id not eqt '22')
   or (b.productid = a.id and a.id  eqt '22')

   order by 1,2
 ;
quit;

Upvotes: 0

SAS2Python
SAS2Python

Reputation: 1297

Keeping the original new_id can be solved by just creating a new variable to merge on.

data true_new new;
  set new_id;

  length match_id $9;

  if substr(id, 1,2) = '22' then call missing(match_id);
  else if substr(id,1,2) in ('88', '55') then match_id = substr(id,3);
  else match_id = id;

  if missing(match_id) then output true_new;
  else output new;

run;

I would now match the new with old_id based on the match_id field and then append true_new to the resulting dataset. These can be done in one step, but I find this a bit more cleaner.

Upvotes: 0

Tom
Tom

Reputation: 51566

So if you can really tell which type of id is in the variable ID by the length of the value then use that to decide whether or not you need to remove two characters.

if length(id)=9 then old_id=id;
else old_id=substr(id,3);

If the hyphens are also not consistently inserted you might want to also remove them.

if length(compress(id,'-'))=7 then old_id=compress(id,'-');
else old_id=compress(substr(id,3),'-');

Upvotes: 1

Related Questions