Papa Yatma Diop
Papa Yatma Diop

Reputation: 21

Create a table from one line CSV data on SAS

I try to import data from a csv with only one line data formatted like this :

CAS$#$#$LLT_CODE$#$#$PT_CODE$#$#$HLT_CODE$#$#$HLGT_CODE$#$#$SOC_CODE$#$#$LLT$#$#$PT$#$#$HLT$#$#$HLGT$#$#$SOC$#$#$SOC_ABB#$#$#DJ20210005-0$#$#$10001896$#$#$10012271$#$#$10001897$#$#$10057167$#$#$10029205$#$#$Maladie d'Alzheimer$#$#$Démence de type Alzheimer$#$#$Maladie d'Alzheimer (incl sous-types)$#$#$Déficiences mentales$#$#$Affections du système nerveux$#$#$Nerv#$#$#DJ20210005-0$#$#$10019308$#$#$10003664$#$#$10007607$#$#$10007510$#$#$10010331$#$#$Communication interauriculaire$#$#$Communication interauriculaire$#$#$Défauts congénitaux du septum cardiaque$#$#$Troubles congénitaux cardiovasculaires$#$#$Affections congénitales, familiales et génétiques$#$#$Cong#$#$#

"#$#$#" determine end of line and "$#$#$" separe columns.

How can i do to import it ?

Here's my code :

data a; 
    infile "C:/Users/Papa Yatma/Documents/My SAS Files/9.4/ATCD.txt" dlm="$" dsd ; 
    input var1 $ var2 $ var3 $ var4 $ var5 $ var6 $ 
       var7 $ var8 $ var9 $ var10 $ var11 $ var12 $ @@; 
run;

Thank you for your help.

Upvotes: 0

Views: 114

Answers (1)

Tom
Tom

Reputation: 51566

As long as the actual "records" are not too long I would use the DLMSTR= option to process the file twice. First to parse the "records" into lines. Then to read the fields from the lines.

So first make a new text file that has one line per record.

filename new temp;
data _null_;
  infile have recfm=n lrecl=1000000 dlmstr='#$#$#';
  file new ;
  input line :$32767. @;
  put line ;
run;

Now you can read the file NEW using the other delimiter string.

For example you could convert it to a real CSV file.

filename csv temp;
data _null_;
  infile new dlmstr='$#$#$' length=ll column=cc truncover ;
  file csv dsd ;
  do until(cc>=ll);
    input word :$32767. @ ;
    put word @;
  end;
  put;
run;

Results:

CAS,LLT_CODE,PT_CODE,HLT_CODE,HLGT_CODE,SOC_CODE,LLT,PT,HLT,HLGT,SOC,SOC_ABB
DJ20210005-0,10001896,10012271,10001897,10057167,10029205,Maladie d'Alzheimer,Démence de type Alzheimer,Maladie d'Alzheimer (incl sous-types),Déficiences mentales,Affections du système nerveux,Nerv
DJ20210005-0,10019308,10003664,10007607,10007510,10010331,Communication interauriculaire,Communication interauriculaire,Défauts congénitaux du septum cardiaque,Troubles congénitaux cardiovasculaires,"Affections congénitales, familiales et génétiques",Cong

This CSV file is then easy to read:

data test;
  infile csv dsd firstobs=2 truncover ;
  length CAS LLT_CODE PT_CODE HLT_CODE HLGT_CODE SOC_CODE LLT PT HLT HLGT SOC SOC_ABB $100;
  input CAS -- SOC_ABB;
run;

If it is possible any of the values might include end of line characters then you should add code to replace those in the first step. For example you might add this line to replace CRLF strings with pipe characters.

line = tranwrd(line,'0D0A'x,'|');

Upvotes: 3

Related Questions