cs0815
cs0815

Reputation: 17388

Removal of ASCII characters in datasteps

Looking at this and this code:

data Work.SomeDataset;
    set Work.SomeDataset;
    SomeColumn = compress(SomeColumn, byte(34));
    SomeColumn = compress(SomeColumn, byte(38));
run;

Am I right in thinking that this removes the characters " (byte(34)) and & (byte(38)) from SomeColumn? If so, is there a more efficient way of achieving this?

Upvotes: 2

Views: 161

Answers (3)

Joe
Joe

Reputation: 63424

Yes, that removes those ascii characters. You can do this in one COMPRESS call, by concatenating, and of course you could just pass '&"' to the function but I assume you have a reason for not doing so.

You might also want to use COMPBL to remove the extra spaces that might be created by this process, if that is a problem.

data Work.want;
  set Work.have;
  someColumn = compress(someColumn,byte(34)||byte(38));
  put SomeColumn=;
run;

You could also use UPDATE (SQL) or MODIFY (data step) to avoid rewriting the dataset...

proc sql;
  update have
    set someColumn = compress(someColumn,byte(34)||byte(38));
quit;

Perhaps more efficient, perhaps not; ultimately probably no different if you cannot use a WHERE statement to filter out cases that don't need updating, but if you can (if it's rare to have this for example) it might be more efficient.

Edit: Modify looks like it's not the answer. In a quick test with 1e7 rows, MODIFY is strictly slower in real time AND has far more CPU time.

SQL UPDATE also is pretty slow, so I wouldn't use either of those unless you have a very rare update scenario (but test in your situation).

Upvotes: 1

DomPazz
DomPazz

Reputation: 12465

You can put the characters into a single call.

data have;
format a $12.;
a = "A" || byte(34) || "B";
output;
a = "A" || byte(34) || "B" || byte(36) || "C";
output;
run;

data want;
set have;
b = compress(a,byte(34)||byte(36));
run;

Upvotes: 1

user667489
user667489

Reputation: 9569

This is already fairly close to optimal, but for a slight improvement I'd suggest:

data Work.SomeDataset;
    set Work.SomeDataset;
    SomeColumn = compress(SomeColumn,'&"');
run;

This makes it clearer which characters are being removed and removes both in the same pass.

Upvotes: 3

Related Questions