Kermit
Kermit

Reputation: 3117

Concatenating a variable dynamically in SAS

I want to create a variable that resolves to the character before a specified character (*) in a string. However I am asking myself now if this specified character appears several times in a string (like it is in the example below), how to retrieve one variable that concatenates all the characters that appear before separated by a comma?

Example:

data have;
infile datalines delimiter=",";
input string :$20.;
datalines;
ABC*EDE*,
EFCC*W*d*
;
run;

Code:

data want;
    set have;
    cnt = count(string, "*");
    _startpos = 0;

    do i=0 to cnt until(_startpos=0); 
        before = catx(",",substr(string, find(string, "*", _startpos+1)-1,1));
    end;    
    drop i _startpos;
run;

That code output before=C for the first and second observation. However I want it to be before=C,E for the first one and before=C,W,d for the second observation.

Upvotes: 1

Views: 691

Answers (3)

whymath
whymath

Reputation: 1394

call scan is also a good choice to get position of each *.

data have;
infile datalines delimiter=",";
input string :$20.;
datalines;
ABC*EDE*,
EFCC*W*d*
****
asdf
;

data want;
  length before $20.;
  set have;

  do i = 1 to count(string,'*');
    call scan(string,i,pos,len,'*');
    before = catx(',',before,substrn(string,pos+len-1,1));
  end;
  put _n_ = +7 before=;
run;

Result:

_N_=1        before=C,E
_N_=2        before=C,W,d
_N_=3        before=
_N_=4        before=

Upvotes: 1

Richard
Richard

Reputation: 27498

You can use Perl regular expression replacement pattern to transform the original string.

Example:

data have;
infile datalines delimiter=",";
input string :$20.;
datalines;
ABC*EDE*,
EFCC*W*d*
;

data want;
  set have;
  csl = prxchange('s/([^*]*?)([^*])\*/$2,/',-1,string);  /* comma separated letters */
  csl = prxchange('s/, *$//',1,csl);                     /* remove trailing comma */
run;

enter image description here

Upvotes: 2

Tom
Tom

Reputation: 51566

Make sure to increment _STARTPOS so your loop will finish. You can use CATX() to add the commas. Simplify selecting the character by using CHAR() instead of SUBSTR(). Also make sure to TELL the data step how to define the new variable instead of forcing it to guess. I also include test to handle the situation where * is in the first position.

data have;
  input string $20.;
datalines;
ABC*EDE*
EFCC*W*d*
*XXXX*
asdf
;

data want;
  set have;
  length before $20 ;
  _startpos = 0;
  do cnt=0 to length(string) until(_startpos=0); 
    _startpos = find(string,'*',_startpos+1);
    if _startpos>1 then before = catx(',',before,char(string,_startpos-1));
  end;    
  cnt=cnt-(string=:'*');
  drop i _startpos;
run;

Results:

Obs    string       before    cnt

 1     ABC*EDE*     C,E        2
 2     EFCC*W*d*    C,W,d      3
 3     *XXXX*       X          1
 4     asdf                    0

Upvotes: 1

Related Questions