Reputation: 2050
I have a huge csv file (as in: more than a few gigs) and would like to read it in Matlab and process each file. Reading the file in its entirety is impossible so I use this code to read in each line:
fileName = 'input.txt';
inputfile = fopen(fileName);
while 1
tline = fgetl(inputfile);
if ~ischar(tline)
break
end
end
fclose(inputfile);
This yiels a cell array of size(1,1) with the line as string. What I would like is to convert this cell to a normal array with just the numbers. For example:
input.csv:
0.0,0.0,3.201,0.192
2.0,3.56,0.0,1.192
0.223,0.13,3.201,4.018
End result in Matlab for the first line:
A = [0.0,0.0,3.201,0.192]
I tried converting tline with double(tline) but this yields completely different results. Also tried using a regex but got stuck there. I got to the point where I split up all values into a different cell in one array. But converting to double with str2double yields only NaNs...
Any tips? Preferably without any loops since it already takes a while to read the entire file.
Upvotes: 0
Views: 5640
Reputation: 38042
You are looking for str2num
>> A = '0.0,0.0,3.201,0.192';
>> str2num(A)
ans =
0 0 3.2010 0.1920
>> A = '0.0 0.0 3.201 0.192';
>> str2num(A)
ans =
0 0 3.2010 0.1920
>> A = '0.0 0.0 , 3.201 , 0.192';
>> str2num(A)
ans =
0 0 3.2010 0.1920
e.g., it's quite agnostic to input format.
However, I will not advise this for your use case. For your problem, I'd do
C = dlmread('input.txt',',', [1 1 1 inf]) % for first line
C = dlmread('input.txt',',') % for entire file
or
[a,b,c,d] = textread('input.txt','%f,%f,%f,%f',1) % for first line
[a,b,c,d] = textread('input.txt','%f,%f,%f,%f') % for entire file
if you want all columns in separate variables:
a = 0
b = 0
c = 3.201
d = 0.192
or
fid = fopen('input.txt','r');
C = textscan(fid, '%f %f %f %f', 1); % for first line only
C = textscan(fid, '%f %f %f %f', N); % for first N lines
C = textscan(fid, '%f %f %f %f', 1, 'headerlines', N-1); % for Nth line only
fclose(fid);
all of which are much more easily expandable (things like this, whatever they are, tend to grow bigger over time :). Especially dlmread
is much less prone to errors than writing your own clauses is, for empty lines, missing values and other great nuisances very common in most data sets.
Upvotes: 3
Reputation: 51480
Try
data = dlmread('input.txt',',')
It will do exactly what you want to do.
If you still want to convert string to a vector:
line_data = sscanf(line,'%g,',inf)
This code will read the entire coma-separated string and convert each number.
Upvotes: 0