Reputation: 355
I have a variable "comments" that is just individual comments by users. Their are some words that are sensitive, such as their usernames, that I need to remove from the string. All usernames start with the same first 3 letters but are then different, and all user names are 8 characters long. I'm trying to figure out a way to remove all usernames from the string but am having no luck. They occur in different places in each string if they occur at all. My first thought was to use TRANSWRD() but I don't think I can use that SAS function with a wildcard. Does anybody know of a solution? Thanks so much for your time!
Upvotes: 1
Views: 110
Reputation: 7602
I would use a PERL regular expression for this, these have very powerful search criteria that will suit your needs. The example below only removes the 3rd and 4th words from the string, i.e. only those with the exact criteria.
data test;
input comments $50.;
regexid = prxparse('s/abc\w{5}\b//'); /* search for 'abc' followed by any 5 characters, followed by a word boundary (i.e. a space) */
call prxchange(regexid,-1,comments); /* remove usernames */
datalines;
abc abc123 abc12345 abc98765 abc123456
;
run;
Upvotes: 2