Reputation: 3488
I have a very lare csv file containing three columns. Now I want to load these columns as fast as possible into a matlab matrix.
Currently what I do is this
fid = fopen(inputfile, 'rt');
g = textscan(fid,'%s','delimiter','\r\n');
tdata = g{1};
fclose(fid);
results = zeros([numel(tdata)-4], 3);
tic
display('start reading data...');
for r = 4:numel(tdata)
if ~mod(r, 100)
display(['data row: ' num2str(r) ' / ' num2str(numel(tdata))]);
end
entries = strsplit(tdata{r}, ',');
results(r-3,1) = str2double(strrep(entries{1},',', '.'));
results(r-3,2) = str2double(strrep(entries{2},',', '.'));
results(r-3,3) = str2double(strrep(entries{3},',', '.'));
end
This however takes ~30 seconds for 200 000 lines. This means 150 µs per line. This is really slow. The code is not accepted by parfor.
Now I would like to know what causes the bottleneck in the for loop and how I can speed it up.
Here the measured times:
str2double 578253 calls 29.631s
strsplit 192750 calls 13.388s
EDIT: The content has this structure in the file
0.000000, -0.00271, 5394147
0.000667, -0.00271, 5394148
0.001333, -0.00271, 5394149
0.002000, -0.00271, 5394150
Upvotes: 0
Views: 98
Reputation: 1845
I think a lot can be improved by calling textscan differently.
You do this:
g = textscan(fid,'%s','delimiter','\r\n');
But then call tdata = g{1};
If textscan is called correctly it should already split all your data, and give it back as numbers.
Try this:
g=textscan(fid,'%f,%f,%f,'delimiter','\r\n')
It should give you back three cell arrays with in the columns your values. To convert to a matrix you can use:
g=cell2mat(g)
I imported 200k lines in 0.12 seconds.
It seems your code has some other workarounds. You start at r=4, it seems you have 3 lines that you don't want to read. so after fopen you can call 3 times
[~] =fgetl(fid)
to get to the interesting part of your file.
You also first split the line with ',' as seperator. But the replace all ',' by '.'. That will not do anything, all ',' are already gone since they were used as seperators.
Upvotes: 1