Reputation: 859
Not sure if this possible in SAS; although I'm slowly learning pretty much anything is possible in SAS...
I have a data-set of 600 patients and within that data-set I have a comment variable. The comment variable contains a few sentences each patient stated about his/her care. So for example, the data set looks like this:
ID Comment
1 Today we have great service. everyone was really nice.
2 The customer service team did not know what they were talking about and was rude.
3 Everyone was very helpful 5 stars.
4 Not very helpful at all.
5 Staff was nice.
6 All the people was really nice.
Lets say I identify a number of key words I'm interested in; for example nice, rude and helpful.
Is there a way to pull 2 strings that come before these words and produce a frequency table?
WORD Frequency
Was Really Nice 2
And Was Rude 1
Was Very Helpful 1
Not very helpful 1
I have a code written already which will help me to identify the key words, this code creates a count of the freq of each word within the comment variable.
data PG_2 / view=PG_2;
length word $20;
set PG_1;
do i = 1 by 1 until(missing(word));
word = upcase(scan(COMMENT, i));
if not missing(word) then output;
end;
keep word;
run;
proc freq data=PG_2 order=freq;
table word / out=wordfreq(drop=percent);
run;
Upvotes: 1
Views: 193
Reputation: 69
Have you looked at the perl regular expression (PRX) functions in SAS. I think they might solve your issue.
You can use RegEx capture groups to pull out the two words directly before your keyword using prxparse
and prxposn
. The below should grab any two words before the word nice in the comment variable and add them to the firstTwoStrings
variable.
data firstTwoStrings;
length firstTwoStrings $200;
retain re;
if _N_ = 1 then
re = prxparse('/(\w+ \w+) nice/'); /*change 'nice' to your desired keyword*/
set comments;
if prxmatch(re, COMMENT) then
do;
firstTwoStrings = prxposn(re, 1, COMMENT);
end;
run;
Upvotes: 3