Reputation: 183
I have a csv file which contains 2d arrays of 4 columns but a varying number of rows. Eg:
2, 354, 23, 101
3, 1023, 43, 454
1, 5463, 45, 7657
4, 543, 543, 654
3, 56, 7654, 344
...
I need to be able to import the data such that I can run operations on each block of data, however csvread, dlmread and textscan all ignore the blank lines.
I can't seem to find a solution anywhere, how can this be done?
PS:
It may be worth noting that the files of the format above are actually the concatenation of many files containing only one block of data (I don't want to have to read from thousands of files every time) therefore the blank line between blocks can be changed to any other delimiter / marker. This is just done with a python script.
EDIT: My Solution - based upon / inspired by petrichor below
I replaced the csvread with textscan which is faster. Then I realised that if I replaced the blank lines with lines of nan instead (modifying my python script) I could remove the need for a second textscan the slow point. My code is:
filename = 'data.csv';
fid = fopen(filename);
allData = cell2mat(textscan(fid,'%f %f %f %f','delimiter',','));
fclose(fid);
nanLines = find(isnan(allData(:,1)))';
iEnd = (nanLines - (1:length(nanLines)));
iStart = [1 (nanLines(1:end-1) - (0:length(nanLines)-2))];
nRows = iEnd - iStart + 1;
allData(nanLines,:)=[];
data = mat2cell(allData, nRows);
Which evaluates in 0.28s (a file of just of 103000 lines). I've accepted petrichor's solution as it indeed best solves my initial problem.
Upvotes: 3
Views: 2158
Reputation: 6579
filename = 'data.txt';
%# Read all the data
allData = csvread(filename);
%# Compute the empty line indices
fid = fopen(filename);
lines = textscan(fid, '%s', 'Delimiter', '\n');
fclose(fid);
blankLines = find(cellfun('isempty', lines{1}))';
%# Find the indices to separate data into cells from the whole matrix
iEnd = [blankLines - (1:length(blankLines)) size(allData,1)];
iStart = [1 (blankLines - (0:length(blankLines)-1))];
nRows = iEnd - iStart + 1;
%# Put the data into cells
data = mat2cell(allData, nRows)
That gives the following for your data:
data =
[3x4 double]
[2x4 double]
Upvotes: 1