CSV data got truncated by SAS

Question

I'm using SAS University Edition 9.4

This is my CSV data.

     ,MGAAAAAAAA,3,A0000B   2F1
11111,ﾊｱﾝ12222234222B56122,4,AA  0000
     ,ﾃｽﾄﾃﾞｰﾀ,5,AACHY 2410F1
     ,ﾃｽﾄﾃﾞﾀﾃｽﾄﾃ,5,AACHYF2

This is my infile statement.

data wk01;
 infile '/folders/myfolders/data/test_csv.txt'
 dsd delimiter=','   
 lrecl=1000 missover firstobs=1;
 input firstcol  :$  secondcol    :$ thirdcol    :$ therest    :$;
run ;

I expected my result like this.

But after executing SAS, What I got is as below (the yellow highlight indicates which row/column have its data being truncated by SAS)

For example, the first row's second column is MGAAAAAAAA but SAS's outut is MGAAAAAA

Could you please point out what am I missing here? Thanks alot.

Tom · Accepted Answer

The values of your variables are longer than the 8 bytes you are allowing for them. The UTF-8 characters can use up to 4 bytes each. Looks like some of them are getting truncated in the middle, so you get an invalid UTF-8 code.

Just define longer lengths for your variables instead of letting SAS use the default length of 8. In general it is best to explicitly define your variables with a LENGTH or ATTRIB statement. Instead of forcing SAS to guess how to define them based on how you first use them in other statements like INPUT, FORMAT, INFORMAT or assignment.

data wk01;
  infile '/folders/myfolders/data/test_csv.txt' dsd dlm=',' truncover ;
  length firstcol $8 secondcol $30 thirdcol $30 therest $100;
  input firstcol secondcol thirdcol therest;
run ;

CSV data got truncated by SAS

Answers (2)

Related Questions