Leksa99
Leksa99

Reputation: 71

How to not change the length of preexisting character variables when creating a new dataset in proc-iml?

I have a dataset which I manipulate in proc-iml and then create a new dataset reading some of the manipulated values in. When I read character values in, their length is changed from 7 to 9.

This doesn't really create a problem, except for the minor annoyance that when I later merge this new dataset, I receive the warning that the variables' length is different in two datasets.

Is there a way to keep the length of the original variable?

Sample code

data data1;
infile datalines delimiter=',';

input classif :$9. time :$7.;
datalines;
05, 2021_11
051, 2021_11
;
run;

proc iml;
    use work.data1;
    read all var {classif time } into _temp_1;
    classif = _temp_1[,1];
    time   = _temp_1[,2];
close;
create work.data2 var{classif time};
append; 
quit;

Observe how the length of time is 7 in data1, but 9 in data2.

Upvotes: 0

Views: 156

Answers (3)

Tom
Tom

Reputation: 51566

If you want the variables from DATA1 to be defined the same in DATA2 you could just add a data step after your PROC IML code.

data data2;
  set data1(obs=0) data2;
run;

It works because SAS defines the variables the first time they are seen. In this case the variables are defined by how the are defined in DATA1 even though the OBS=0 dataset option will prevent any observations actually being read from DATA1.

Upvotes: 2

Rick
Rick

Reputation: 1210

As @Richard explained, this happens when you read two character variables that have different lengths into columns of a common matrix. I can think of at least three workarounds. Depending on your application, one of these methods might be more convenient than others.

proc iml;
/* Option 1: Read variables into vectors, not a matrix */
use work.data1;
read all var {classif time };
close;
print (nleng(time))[L="nleng(time)"];

/* Option 2: Allocate time to have LENGTH=7 and copy the data in */
use work.data1;
read all var {classif time } into _temp_1;
close;
time = j(nrow(_temp_1), 1, BlankStr(7));  /* allocate char vector */
time[,]   = _temp_1[,2];                  /* copy the data */
print (nleng(time))[L="nleng(time)"];

/* Option 3: Read into a table instead of a matrix. */
tbl = TableCreateFromDataset("work", "data1") ;
classif = TableGetVarData(tbl, {"Classif"});
time = TableGetVarData(tbl, {"time"});
print (nleng(time))[L="nleng(time)"];

Upvotes: 3

Richard
Richard

Reputation: 27508

From Understanding the SAS/IML Language

Defining a Matrix

A matrix is the fundamental structure in the SAS/IML language. A matrix is a two-dimensional array of numeric or character values. Matrices are useful for working with data and have the following properties:

  • Matrices can be either numeric or character. Elements of a numeric matrix are double-precision values. Elements of a character matrix are character strings of equal length.

The INTO places the character values into a matrix _temp_1 that must hold all the original values, so the elements width are the attribute length of the widest data set variable.

The attributes of the _temp_1 matrix elements are propagated through the assignment statements.

Upvotes: 1

Related Questions