Phil
Phil

Reputation: 57

Replacing Mutiple Words in String in SAS

I am trying to search a string for multiple words, then if any of those words are found, remove them. I wrote the below code which seems to work on some words but not all, and when it does work, it only works on the last word in the string.

data readyinput;
set readyforstreetname(obs=200);

array cw (48) $11 (' ave ',' avenue ',' blvd ',' boulevard ',' cir ',' circle ',' court ',' ct ',' drive ',' dr ',' e ',' east ',' highway ',' hwy ',' lane ',' ln ',' north ',' n ',' nw ',' northwest ',' parkway ',' pkwy ',' pl ',' place ',' pl ',' plaza ',' rd ',' road ',' route ',' route ',' rte ',' rte ',' rt ',' rt ',' s ',' south ',' se ',' southeast ',' st ',' street ',' suite ',' ste ',' sw ',' southwest ',' w ',' west ',' apartment ',' apt ');

do i=1 to dim(cw);

if indexw(lowcase(address_input),cw[i])
then 
do;
    add = upcase(tranwrd(lowcase(address_input),cw[i],''));
end;    
end;


drop    cw:;
run;

Basically what I'm trying to do is strip an address of all common words then parse out the street number and street name, which would be done in a later step.

Upvotes: 0

Views: 4141

Answers (2)

Sean
Sean

Reputation: 1120

Not sure that arrays make this simpler. You can just loop through your list of words to remove and replace them with blanks. You may also want to strip the resultant address variable so any double/triple blanks are removed at the end.

%let words_to_ignore = "word1" "word2" "word3" ... "wordN";

%macro remove_words;

    data your_data2;
        set your_data;
        %do i = 1 %to %sysfunc(countw(&words_to_ignore.));
        %let this_word = %scan(&words_to_ignore., &i.);
            address = tranwrd(address, "&this_word.", "");
        %end;
        address = compbl(address);
    run;    


%mend remove_words;

%remove_words;

Upvotes: 0

Tom
Tom

Reputation: 51611

Your problem is that every time you try to remove a word you are starting with the original string instead of the string as modified by the earlier words.

add=lowcase(address_input);
do i=1 to dim(cw);
  if indexw(add,cw[i]) then 
    add = tranwrd(add,cw[i],'')
  ;
end;    
add = upcase(add);

You probably also need to change how you are finding and converting the words. I find that it works better using INDEXW() to specify a non blank word delimiter.

data test ;
  array cw (2) $10 _temporary_ ('N','ST');
  input address $80. ;
  new=address;
  new = cats('|',translate(upcase(left(compbl(new))),'|',' '),'|');
  do i=1 to dim(cw) ;
    if indexw(new,cats('|',cw(i),'|'),'|') then
      new=tranwrd(new,cats('|',cw(i),'|'),'|')
    ;
  end;
  new = translate(new,' ','|');
  put address= / new= ;
cards;
N Main St
;;;;

Upvotes: 1

Related Questions