sco1
sco1

Reputation: 12214

Most efficient way to split character array into cells?

I'm currently working on rewriting one of my older data processing functions and I have an optimization question. The main purpose of the function is to process & compile up to 39 columns of 1000Hz DAQ data from a *.csv file, so I'm dealing with a fairly large amount of data.

The original function was 'dumb' to the number of lines in the file and simply concatenated the data chunks read in by textscan with the existing array.

Example pseudocode:

some_variable = [] % initialize
while ~feof(fid)
    segarray = textscan(fid, format, chunk_size, 'Delimiter',',');
    some_variable = [some_variable segarray{:,1}];
end

Painfully inefficient but when I wrote it I didn't know any better.

Anyway, my new function leverages a quick couple lines of Perl (found over at the MATLAB Newsgroup, I primarily work on Windows) to count the number of lines so I can intelligently initialize all of my data arrays.

This has given me a significant increase in speed but uncovered a couple implementation issues, one of which is the one I'd like to ask about. A few of the columns are single character fields (%c in the textscan format spec) that textscan concatenates into a single character array on its output. So for a chunk size of 5000, the output from textscan in that column is a 5000x1 character array (with no delimiter). I'd like to split this into a 5000x1 cell array but I'm not sure what the most efficient way to do this is.

The line I came up with utilized mat2cell: some_variable = mat2cell(segarray{:,1},ones(length(segarray{:,1}),1),1), which works fine, but is there a faster method?

Upvotes: 1

Views: 454

Answers (1)

user2271770
user2271770

Reputation:

If you'll check the code for mat2cell:

edit mat2cell  % at Command window

you'll see that it is trivially implemented: allocates the cell array according to dimensions, then fills in the cells with the "sliced" input in a for loop.

This suggests to make an analogous "special" function that deals only with this:

function C = str2cell(S)
    N = numel(S);
    C = cell(N, 1);
    for k = 1:N
        C{k} = S(k);
    end;
end

I can't imagine something faster than this... and, contrary to the common wisdom, for loops are quite fast in the last versions of MATLAB.

Please note that if your S parameter is a 2D matrix, the function will "unwind" it to a row cell array output.

Upvotes: 2

Related Questions