user1284978
user1284978

Reputation: 13

SAS: how to remove the first word in a string if it equals a word in another variable

I have variable1 with strings such as "asdfsad What do you do", "qwer What is your name", "Zebra"

And variable2 with strings "asdfsad", "qwer", "Animal"

I want to remove the first word from the strings in variable1 if it equals the word in variable2. The only thing I can come up with so far is to replace each word separately:

i.e. variable1=tranwrd(variable1, "asdfsad", ""); etc. however I have many words to replace.

Many thanks for your help.

Upvotes: 1

Views: 17840

Answers (4)

Josh Bode
Josh Bode

Reputation: 3742

This is probably not going to be efficient or feasible for thousands of words, but you could use a Perl regular expression (e.g. s/search/replacement/) via prxchange

/* words to match delimited by "|" */
%let words = asdfsad|qwer|Animal|foo|bar|horse;

/* example data */
data example;
  infile datalines dlm=',' dsd;
  input string: $256.;
datalines;
asdfsad What do you do
qwer What is your name
Zebra
food is in the fridge
foo    A horse entered a bar
;
run;

/* cleaned data */
data example_clean;
  set example;

  /*
    regular expression is:
      - created once on first row (_n_ = 1)
      - cached (retain regex)
      - dropped at the end (drop regex).
  */
  if _n_ = 1 then do;
    retain regex;
    drop regex;
    regex = prxparse("s/^(&words)\s+//");
  end;

  string = prxchange(regex, 1, string);  /* apply the regex (once) */
run;

The ^ symbol in the regular expression (constructed in prxparse) ensures that it matches only at the start of the word, the | symbols make it an 'or' match and the \s+ matches one or more whitespace characters (which is why in my example, "food" is not matched).

Upvotes: 0

DavB
DavB

Reputation: 1696

if scan(variable1,1)=variable2 then
  variable1=substr(variable1,index(variable1," "));

Upvotes: 0

Nelson Wu
Nelson Wu

Reputation: 98

If you are happy with the results from tranwrd, you can use that too. You just need to be careful of whitespace

variable1 = strip(tranwrd(variable1, strip(variable2), ''));

Upvotes: 0

Robert Penridge
Robert Penridge

Reputation: 8513

How about something like this:

data sample;
  length variable1 variable2 $100;
  variable1= "asdfsad What do you do"; variable2 = "asdfsad"; output;
  variable1= "qwer What is your name"; variable2 = "qwer";    output;
  variable1= "Zebra"                 ; variable2 = "Animal";  output;
run;

data fixed;
  length first_word $100;

  set sample;

  first_word = scan(variable1,1);
  if first_word eq variable2 then do;
    start_pos = length(first_word) + 1;
    variable1 = substr(variable1,start_pos); 
  end;
run;

This will work for matching on the entire first word. It leaves the spaces or other punctuation in the remaining text but you should be able to change that easily if you like.

If your problem is to match character-by-character and not on the entire first word then that would be a very different question and I would recommend posting a new question.

Upvotes: 2

Related Questions