STL
STL

Reputation: 293

Using Compress and Scan in SAS to create new variable

I am new to SAS and would like to keep only what's in between the parentheses of another column in my new column using compress and SCAN:

 x 
abc(animal)
efg(food)
hij(plant)

I tried:

DATA NEW 
    set OLD;
    y = (compress(scan([x], 2, '('), ')');
RUN;
PROC PRINT NEW; 
RUN; 

but I can't seem to get it to work. Any insight would be helpful.

Upvotes: 2

Views: 663

Answers (2)

Kiran
Kiran

Reputation: 3315

@tom answer is perfect and another way to do this is using prxsubstr.

 data have;
 input x $20.;
 datalines;
abc(animal)
efg(food)
hij(plant)
;

 prxparse \(.+?\) is to find anything between parenthesis
 \( -- starting of parenthesis
 .+?\( ---  anything till closing parenthesis

call prxsubstr captures where it finds pattern and length of it. then doing substr to get the values inside the parenthesis

data want;
set have;
if _n_= 1 then do;
retain re;
re = prxparse('/\(.+?\)/');
end;
call prxsubstr(re, trim(x), position, length);

if position gt 0 then 
 new=substr(trim(x),position+1, length-2);
run;

Upvotes: 3

Tom
Tom

Reputation: 51611

There should be no need for compress(), unless you want to remove some other characters from the value extracted. You might want to use left() to remove leading spaces from values like xxx( yyy ).

y=scan(x,2,'()');

If you have some values like (no first word) then you might need to use index of 1 instead of two. If so you could use =: to test if X starts with an open paren.

y=scan(x,1+(x^=:'('),'()');

If you know the () are always at the end you could use index of -1. But again if some strings had characters after the close paren then you would need to use -2 instead.

Upvotes: 2

Related Questions