Tim
Tim

Reputation: 2050

Matlab: Convert cell string (comma separated) to vector

I have a huge csv file (as in: more than a few gigs) and would like to read it in Matlab and process each file. Reading the file in its entirety is impossible so I use this code to read in each line:

fileName = 'input.txt';
inputfile = fopen(fileName);

while 1
    tline = fgetl(inputfile);
    if ~ischar(tline)
        break
    end
end
fclose(inputfile); 

This yiels a cell array of size(1,1) with the line as string. What I would like is to convert this cell to a normal array with just the numbers. For example:

input.csv:
0.0,0.0,3.201,0.192
2.0,3.56,0.0,1.192
0.223,0.13,3.201,4.018

End result in Matlab for the first line:

A = [0.0,0.0,3.201,0.192]

I tried converting tline with double(tline) but this yields completely different results. Also tried using a regex but got stuck there. I got to the point where I split up all values into a different cell in one array. But converting to double with str2double yields only NaNs...

Any tips? Preferably without any loops since it already takes a while to read the entire file.

Upvotes: 0

Views: 5640

Answers (2)

Rody Oldenhuis
Rody Oldenhuis

Reputation: 38042

You are looking for str2num

>> A = '0.0,0.0,3.201,0.192';
>> str2num(A)
ans =
     0  0  3.2010  0.1920
>> A = '0.0 0.0 3.201 0.192';
>> str2num(A)
ans =
     0  0  3.2010  0.1920
>> A = '0.0     0.0 ,    3.201 , 0.192';
>> str2num(A)
ans =
     0  0  3.2010  0.1920

e.g., it's quite agnostic to input format.

However, I will not advise this for your use case. For your problem, I'd do

C = dlmread('input.txt',',', [1 1 1 inf]) % for first line
C = dlmread('input.txt',',')              % for entire file

or

[a,b,c,d] = textread('input.txt','%f,%f,%f,%f',1) % for first line
[a,b,c,d] = textread('input.txt','%f,%f,%f,%f')   % for entire file

if you want all columns in separate variables:

a = 0
b = 0
c = 3.201
d = 0.192

or

fid = fopen('input.txt','r');
C = textscan(fid, '%f %f %f %f', 1); % for first line only
C = textscan(fid, '%f %f %f %f', N); % for first N lines
C = textscan(fid, '%f %f %f %f', 1, 'headerlines', N-1); % for Nth line only
fclose(fid);

all of which are much more easily expandable (things like this, whatever they are, tend to grow bigger over time :). Especially dlmread is much less prone to errors than writing your own clauses is, for empty lines, missing values and other great nuisances very common in most data sets.

Upvotes: 3

Leonid Beschastny
Leonid Beschastny

Reputation: 51480

Try

data = dlmread('input.txt',',')

It will do exactly what you want to do.

If you still want to convert string to a vector:

line_data = sscanf(line,'%g,',inf)

This code will read the entire coma-separated string and convert each number.

Upvotes: 0

Related Questions