Bhavani K
Bhavani K

Reputation: 165

How to convert a SAS dataset into CSV file whereas a single filed in it has value with comma

I have a SAS dataset, let us say

it has 4 columns A,B,C,D and the values

A = x
B = x
C = x
**D = x,y**

Here column D has two values inside a single column while converting it into CSV format it generates a new column with the value Y. How to avoid this and to convert SAS dataset into CSV file?

 * get some test records in a file;
Data _null_;
    file 'c:\tmp\test.txt' lrecl=80;
    put '1,22,Hans Olsen,Denmark,333,4';
    put '1111,2,Turner, Alfred,England,3333,4';
    put '1,222,Horst Mayer,Germany,3,4444';
run;

* Read the file as a delimited file; 
data test; infile 'c:\tmp\test.txt' dsd dlm=',' missover;
    length v1 v2 8 v3 v4 $40 v5 v6 8;
    input
        'V1'n : ?? BEST5.
        'V2'n : ?? BEST5.
        'V3'n : $CHAR40.
        'V4'n : $CHAR40.
        'V5'n : ?? BEST5.
        'V6'n : ?? BEST5.;
run;

* Read the file and write another file.
* If 6 delimiters and not 5, change the third to #;
data test2; 
    infile 'c:\tmp\test.txt' lrecl=80 truncover;
    file 'c:\tmp\test2.txt' lrecl=80;
    length rec $80;
    drop pos len;
    input rec $char80.;
    if count(rec,',') = 6 then do;
        call scan(rec,4,pos,len,',');
        substr(rec,pos-1,1) = '','';
    end;
    put rec;
run;

* Read the new file as a delimited file; 
data test2; infile 'c:\tmp\test2.txt' dsd dlm=',' missover;
    length v1 v2 8 v3 v4 $40 v5 v6 8;
    input
        'V1'n : ?? BEST5.
        'V2'n : ?? BEST5.
        'V3'n : $CHAR40.
        'V4'n : $CHAR40.
        'V5'n : ?? BEST5.
        'V6'n : ?? BEST5.;
run;

In this code, it add '#' but I want ',' itself in the output. Could anyone please guide me to do that? Thanks in advance!!

Upvotes: 0

Views: 885

Answers (2)

Tom
Tom

Reputation: 51566

It sounds like you are starting with an improperly created CSV file.

1,22,Hans Olsen,Denmark,333,4
1111,2,Turner, Alfred,England,3333,4
1,222,Horst Mayer,Germany,3,4444

That should have been made like this:

1,22,Hans Olsen,Denmark,333,4
1111,2,"Turner, Alfred",England,3333,4
1,222,Horst Mayer,Germany,3,4444

If you are positive that you know that the only field with embedded commas is the third then you can use a data step to read it in and generate a valid file.

data _null_;
  infile bad dsd truncover ;
  file good dsd ;
  length v1-v6 dummy $200;
  input v1-v2  @;
  do i=1 to countw(_infile_,',','q')-5;
    input dummy @;
    v3=catx(', ',v3,dummy);
  end;
  input v4-v6 ;
  put v1-v6 ;
run;

Once you have a properly formatted CSV file then it is easy to read.

data want;
  infile good dsd truncover ;
  length v1-v2 8 v3-v4 $40 v5-v6 8;
  input v1-v6 ;
run;

But if the extra comma could be in any field then you will probably need to have a human fix those lines.

Upvotes: 4

Richard
Richard

Reputation: 27498

If your field value contains the field delimiter you will want to double quote the field value. Proc EXPORT will do such double quoting when the data base type is specified as CSV

Example:

data have;
A = 1;
B = 2;
C = 3;
D = 'x,y';
run;

filename csv temp;

proc export data=have outfile=csv dbms=csv;
run;

data _null_;
  infile csv;
  input;
  put _infile_;
run;

The log will show the exported file contains double quoted values as needed in the csv file produced.

Log

A,B,C,D
1,2,3,"x,y"

Upvotes: 2

Related Questions