Raven
Raven

Reputation: 859

Pull Strings Before and After Key Words

Not sure if this possible in SAS; although I'm slowly learning pretty much anything is possible in SAS...

I have a data-set of 600 patients and within that data-set I have a comment variable. The comment variable contains a few sentences each patient stated about his/her care. So for example, the data set looks like this:

 ID        Comment
 1         Today we have great service. everyone was really nice.
 2         The customer service team did not know what they were talking about and was rude.
 3         Everyone was very helpful 5 stars.
 4         Not very helpful at all.
 5         Staff was nice.
 6         All the people was really nice.

Lets say I identify a number of key words I'm interested in; for example nice, rude and helpful.

Is there a way to pull 2 strings that come before these words and produce a frequency table?

 WORD            Frequency 
 Was Really Nice         2
 And Was Rude            1
 Was Very Helpful        1
 Not very helpful        1

I have a code written already which will help me to identify the key words, this code creates a count of the freq of each word within the comment variable.

 data PG_2 / view=PG_2;
 length word $20;
 set PG_1;
 do i = 1 by 1 until(missing(word));
 word = upcase(scan(COMMENT, i));
 if not missing(word) then output;
 end;
 keep word;
 run;

 proc freq data=PG_2 order=freq;
 table word / out=wordfreq(drop=percent);
 run;

Upvotes: 1

Views: 193

Answers (1)

Ben Corcoran
Ben Corcoran

Reputation: 69

Have you looked at the perl regular expression (PRX) functions in SAS. I think they might solve your issue.

You can use RegEx capture groups to pull out the two words directly before your keyword using prxparse and prxposn. The below should grab any two words before the word nice in the comment variable and add them to the firstTwoStrings variable.

data firstTwoStrings;
   length firstTwoStrings $200;
   retain re;
   if _N_ = 1 then
      re = prxparse('/(\w+ \w+) nice/'); /*change 'nice' to your desired keyword*/
   set comments;
   if prxmatch(re, COMMENT) then 
      do;
         firstTwoStrings = prxposn(re, 1, COMMENT);
      end;
run;

Upvotes: 3

Related Questions