Reputation: 57
I am trying to search a string for multiple words, then if any of those words are found, remove them. I wrote the below code which seems to work on some words but not all, and when it does work, it only works on the last word in the string.
data readyinput;
set readyforstreetname(obs=200);
array cw (48) $11 (' ave ',' avenue ',' blvd ',' boulevard ',' cir ',' circle ',' court ',' ct ',' drive ',' dr ',' e ',' east ',' highway ',' hwy ',' lane ',' ln ',' north ',' n ',' nw ',' northwest ',' parkway ',' pkwy ',' pl ',' place ',' pl ',' plaza ',' rd ',' road ',' route ',' route ',' rte ',' rte ',' rt ',' rt ',' s ',' south ',' se ',' southeast ',' st ',' street ',' suite ',' ste ',' sw ',' southwest ',' w ',' west ',' apartment ',' apt ');
do i=1 to dim(cw);
if indexw(lowcase(address_input),cw[i])
then
do;
add = upcase(tranwrd(lowcase(address_input),cw[i],''));
end;
end;
drop cw:;
run;
Basically what I'm trying to do is strip an address of all common words then parse out the street number and street name, which would be done in a later step.
Upvotes: 0
Views: 4141
Reputation: 1120
Not sure that arrays make this simpler. You can just loop through your list of words to remove and replace them with blanks. You may also want to strip the resultant address variable so any double/triple blanks are removed at the end.
%let words_to_ignore = "word1" "word2" "word3" ... "wordN";
%macro remove_words;
data your_data2;
set your_data;
%do i = 1 %to %sysfunc(countw(&words_to_ignore.));
%let this_word = %scan(&words_to_ignore., &i.);
address = tranwrd(address, "&this_word.", "");
%end;
address = compbl(address);
run;
%mend remove_words;
%remove_words;
Upvotes: 0
Reputation: 51611
Your problem is that every time you try to remove a word you are starting with the original string instead of the string as modified by the earlier words.
add=lowcase(address_input);
do i=1 to dim(cw);
if indexw(add,cw[i]) then
add = tranwrd(add,cw[i],'')
;
end;
add = upcase(add);
You probably also need to change how you are finding and converting the words. I find that it works better using INDEXW() to specify a non blank word delimiter.
data test ;
array cw (2) $10 _temporary_ ('N','ST');
input address $80. ;
new=address;
new = cats('|',translate(upcase(left(compbl(new))),'|',' '),'|');
do i=1 to dim(cw) ;
if indexw(new,cats('|',cw(i),'|'),'|') then
new=tranwrd(new,cats('|',cw(i),'|'),'|')
;
end;
new = translate(new,' ','|');
put address= / new= ;
cards;
N Main St
;;;;
Upvotes: 1