Reputation: 23
I am trying to figure out as to how to read data that's in a text file (300mb)separated by commas but the data is in one line.
Data looks like this:
a,b,c,d,e,f,g,h,i,j,k,l,m,false,false,true,1,379,0,,1,1,1,1,1,1,0,1,0,6,0,6,0,6,6,6,6,6,6,6,6,6,0,6,0,0,0,0,0,0,0
Here data from A to M are variable names and rest is data for those variables. Can anyone please help me read this data into SAS?
Thanks so much!
Upvotes: 2
Views: 2490
Reputation: 17077
You can remove the variable names (a ,b ...m) from the file and do this:
data a; infile 'C\example.txt' dlm=',' dsd ; input a $ b $ c $ d $ e $ f $ g $ h $ i $ j $ k $ l $ m $ @@; run;
The @@ will make sure it keeps reading and does not go to the next line once it has read the value of the last variable (m)
Upvotes: 0
Reputation: 7119
Why not something simple like this:
DATA test;
INFILE 'your_huge_file.csv' DSD;
INPUT a $ b $ c $ d $ e $ f $ g $ h $ i $ j $ k $ l $ m @@;
IF a = 'a' THEN DELETE; * This will exclude the "headers"
RUN;
Upvotes: 1
Reputation: 63424
Your best bet is going to be to read it in with two passes: a line-delimiting step and a readin step.
I would suggest starting by using TERMSTR="," for the line, so you have a huge number of lines with one field. Then figure out where your line should terminate, and make that into a single line, outputting to a file with a normal-for-your-os line terminator.
Then you can read that in with normal readin methods.
For example, imagine I have a file with this line:
a,b,c,d,e,f,1,2,3,4,5,6,7,8,9,10,11,12
Then I could read it in like this.
filename oneline "c:\temp\oneline.csv";
filename intermed temp;
%let numfields=6;
data _null_;
infile oneline termstr=",";
file intermed dlm=',';
do _i = 1 to &numfields;
input line $;
putlog line;
put line @;
end;
put;
run;
data want;
infile intermed dlm=',' firstobs=2;
input a b c d e f;
run;
You could also add some more code to parse the first line and put it in a macro variable or include file that you then use to generate the input
line in the later data step, but I leave that as an exercise for the reader.
Upvotes: 6
Reputation: 9569
You can use a double trailing @
in your input statement, e.g.
data example;
input a b @@;
infile cards dlm=',';
cards;
1,2,3,4,5,6
;
run;
This may cause some errors when it tries to read the column headers but it should be ok for subsequent iterations.
To get around the lrecl-related crashes, you could instead replace commas with line breaks using an external utility (e.g. GNU sed) before importing the file into SAS, and then write an input statement that reads multiple lines from the transposed file to populate each record.
Upvotes: 1