G Man
G Man

Reputation: 27

SAS Writing text to a text file results in extra spaces not written to the LOG file

I have a dataset of address standardizations -- take STREET and replace it with ST -- and wish to write code that does the substitution. When testing out the code, it appears as intended in the LOG file but extra spaces are added when I write to the text file. I don't want the extra spaces.

-=-=-=-=-=- SAS CODE

   data std ;
    
        length pre $16 post $8 ;
        infile datalines delimiter=',' ;
        input pre $ post $ ;
        
    pre = strip(pre);
    post = strip(post);
    
    datalines;  
    AVENUES ,   AVE
    AVENUE  ,   AVE
    BOULEVARD   ,   BLVD
    CIRCLE  ,   CIR
    ;
    run;
    
    data _null_ ;
    
        file "&test.txt";
        set std ;
        
    p1  =   trim(pre) ;
    p2  =   trim(post);
        
    put '&var = strip( prxchange("s/(^|\s)' p1 +(-1) '\s/ ' p2 +(-1) ' /i",-1,&var) );' ;
    
    run;

-=-=-=-=-=-=-=- END OF CODE

The SAS code produces the following ...

&var = strip( prxchange("s/(^|\s)AVENUES\s/ AVE /i",-1,&var) );

 &var = strip( prxchange("s/(^|\s)AVENUE\s/ AVE /i",-1,&var) );

 &var = strip( prxchange("s/(^|\s)BOULEVARD\s/ BLVD /i",-1,&var) );

... in the LOG file when I remove the file statement, but writes ...

&var = strip( prxchange("s/(^|\s)AVENUES    \s/     AVE /i",-1,&var) );

&var = strip( prxchange("s/(^|\s)AVENUE \s/     AVE /i",-1,&var) );

&var = strip( prxchange("s/(^|\s)BOULEVARD  \s/     BLVD /i",-1,&var) );

... with extra spaces inside the REGEX function in the file test.txt.

This is SAS 9.4 which I'm using through a web-based SAS Studio.

Upvotes: 1

Views: 566

Answers (1)

Joe
Joe

Reputation: 63424

So, your problem is based on how SAS stores character variables.

A character variable is always equal to the characters stored in that variable, followed by as many space ('20'x) characters as needed to fill the length of the data storage. This differs from (mostly newer) languages that have a string terminator character or similar; SAS has no such character, it just fills the space with spaces. So if the variable is 8 bytes long, and contains Avenue, then it actually contains Avenue .

You cannot change that in code, outside of a single line of code. So, your lines:

p1  =   trim(pre) ;
p2  =   trim(post);

Are meaningless - they do nothing except waste CPU time (sadly, not optimized away from what I can tell).

You need to trim in the line you use the value, as there it can be trimmed away. Now, you can't put a trim(...), so you need to compose your line to be written elsewhere, or else use the $varying. format.

Here's one example:

filename tempfile temp;

data _null_ ;    
    file tempfile;
    set std ;        
    result = cats('&var = strip( prxchange("s/(^|\s)',pre,catx(' ','\s/',post,'/i",-1,&var) );'));
    put  result ;

run;

data _null_;
  infile tempfile;
  input @;
  put _infile_;
run;

Here's an example using $varying.:

data _null_ ;    
    file tempfile;
    set std ;        
    varlen_pre = length(pre);
    varlen_post = length(post);
      put '&var = strip( prxchange("s/(^|\s)' pre $varying16. varlen_pre  '\s/ ' post $varying8. varlen_post  ' /i",-1,&var) );' ;

run;

As to why your log doesn't match the file, that's because SAS has slightly different rules for when it writes to logs than when it writes to files. It's much more exact about what it writes to a file; you say it, it writes it. For logs it has a few places where it removes spaces for you, presumably to make logs more readable, as it's not as necessary to be precise. This can be a pain when you DO want the precision in the log, and of course in your case where you want the log to match what you're seeing...


Finally, a note on what you're doing. I don't highly recommend using a regex the way you're using it. It's very slow. Unless you're only doing a handful of replacements, or only have a small dataset size, or really don't care how long this takes...

If it's just 1:1 replacements, I'd recommend tranwrd, which is much faster. See just this small comparison:

data in_data;
  do _n_ = 1 to 1e5;
    address = catx(' ',rand('Integer',1,9999),'Avenue');
    output;
    address = catx(' ',rand('Integer',1,9999),'Street');
    output;
    address = catx(' ',rand('Integer',1,9999),'Boulevard');
    output;
    address = catx(' ',rand('Integer',1,9999),'Circle');
    output;
    address = catx(' ',rand('Integer',1,9999),'Route');
    output;
    
  end;
run;

data want;
  set in_data;

  rx_ave  = prxparse('s/(^|\s)Avenue\s/ Ave /ios');
  rx_st   = prxparse('s/(^|\s)Street\s/ St/ios');
  rx_blvd = prxparse('s/(^|\s)Boulevard\s/ Blvd /ios');
  rx_cir  = prxparse('s/(^|\s)Circle\s/ Cir /ios');
  
  do i = 1 to 4;
    address = prxchange(i,-1,address);
  end;
run;

data want;
  set in_data; 
    address = tranwrd(Address,'Avenue','Ave');
    address = tranwrd(address,'Street','St');
    address = tranwrd(address,'Boulevard','Blvd');
    address = tranwrd(address,'Circle','Cir');
run;

Both work the same - given you're already using 'words' anyway - but the second works in basically the time it takes to write out the dataset (0.15s for me), while the first takes 20s of CPU time on my SAS server. Loading the regex library is really slow.

Upvotes: 1

Related Questions