memyself
memyself

Reputation: 12618

why are access times to data stored in cells shorter than in matrices?

I'm dealing with very big data in matlab and used to store this data in matrices. I used to store my data by row, but since Matlab stores data column-wise I understand that reshaping my matrix so I index by column makes handling faster. Here's an example of what I mean:


general parameters

nbr_channels = 20;
nbr_samples_per_channel = 3200000;
fake_data = randn(1, nbr_samples_per_channel);
ROI = 1200000 : 2800000;

assign data by row

data = nan(nbr_channels, nbr_samples_per_channel);
tic; 
for j = 1 : nbr_channels
    data(j, 1:nbr_samples_per_channel) = fake_data; 
end; 
toc;

% Elapsed time is 1.476525 seconds.

return data from row matrix

tic; 
for j = 1 : nbr_channels
    bla = data(j, ROI); 
end; 
toc;

% Elapsed time is 0.572162 seconds.

return all data from row matrix

tic; 
for j = 1 : nbr_channels
    bla = data(j, :); 
end; 
toc;

% Elapsed time is 0.589489 seconds.

assign data by column

data = nan(nbr_samples_per_channel, nbr_channels);
tic; 
for j = 1 : nbr_channels
    data(1:nbr_samples_per_channel, j) = fake_data; 
end; 
toc;

% Elapsed time is 0.299682 seconds.

return data from column matrix

tic; 
for j = 1 : nbr_channels
    bla = data(ROI, j); 
end; 
toc;

% Elapsed time is 0.260824 seconds.

return all data from column matrix

tic; 
f    or j = 1 : nbr_channels
    bla = data(:, j); 
end; 
toc;

% Elapsed time is 0.092983 seconds.

Summary Part1:

As we can see, accessing the data by column reduces the handling times by at least a factor of two!

But I don't understand why cells are even more efficient! Have a look at this example:

assign data by cell

data = cell(1, nbr_samples_per_channel);
tic; 
for j = 1 : nbr_channels
    data{j} = fake_data; 
end; 
toc;

% Elapsed time is 0.000013 seconds.

return data from cell array

tic; 
for j = 1 : nbr_channels
    bla = data{j}(ROI); 
end; 
toc;

% Elapsed time is 0.260294 seconds.

return all data from cell array

tic; 
for j = 1 : nbr_channels
    bla = data{j}; 
end; 
toc;

% Elapsed time is 0.000022 seconds.

%%

Summary Part2:

This is orders of magnitude faster than what I have shown in Part1.

Question 1

why are access times to data stored in cells shorter than in matrices?

Question 2

Working with matrices is generally easier than with cells, because with a matrix on can do

my_matrix(100:20000, 1:3)

but with cells I can't do this (as far as I know). Any alternatives on how to return specific elements from multiple cells at the same time?

Upvotes: 4

Views: 160

Answers (1)

Andrew
Andrew

Reputation: 1187

You are seeing different times because you are not doing equivalent things. To compare two of your cases:

assign data by cell

  • You are creating a cell array row vector, and stuffing in a long double vector to each cell

  • Each loop iteration results in 1 assignment of a vector into a single slot in a cell array

  • There are 'nbr_samples_per_channel' number of assignments being done.

assign data by column

  • you are going through the columns of a matrix, and assigning a vector to each element in each column

  • Each loop iteration, regardless of the shorthand colon : notation you used, resolves into many assignments. data(1:nbr_samples_per_channel, j) means 'nbr_samples_per_channel' assignments PER iteration.

  • overall, you are doing 'nbr_samples_per_channel' * 'nbr_channels' total assignments.

To make my point, just re-write the loop without the colon operator to visualize all the assignments.

for j = 1 : nbr_channels    

    n = length(fake_data)

    data(1,     j) = fake_data(1); 
    data(2,     j) = fake_data(1); 

    ... etc ...

    data(n - 1, j) = fake_data(n-1); 
    data(n,     j) = fake_data(n); 

end

So, to conclude, you are comparing two different things, so you can't say one is really faster than the other, because they are not equivalent.

If you just loop over a double array and a cell array, and do regular assignments....

%% Setup samples and pre-allocate
numberOfSamples = 100000;

doubleData = nan(numberOfSamples, 1);
cellData = cell(numberOfSamples, 1);

randomValues = rand(numberOfSamples, 1);

%% Assign N number of values to a double array
tic; 
for idx = 1 : numberOfSamples
    data(numberOfSamples) = randomValues(idx);
end
doubleTime = toc;

%% Assign N number of values to a cell array
tic; 
for idx = 1 : numberOfSamples
    cellData{numberOfSamples} = randomValues(idx);
end
cellTime = toc;

disp(sprintf('Double Array: %f seconds', doubleTime));
disp(sprintf('Cell   Array: %f seconds', cellTime));

You end up with:

Double Array: 0.006073 seconds
Cell   Array: 0.032966 seconds

For your second question, is this what you are trying to do?

>> bigCell = {1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16}

bigCell = 

    [ 1]    [ 2]    [ 3]    [ 4]
    [ 5]    [ 6]    [ 7]    [ 8]
    [ 9]    [10]    [11]    [12]
    [13]    [14]    [15]    [16]

>> subCell = bigCell(1:2, 3:4)

subCell = 

    [3]    [4]
    [7]    [8]

Notice that subcell is still a cell. By using ( ) 's and not { } 's to access the cell, you preserve it being a cell.

Upvotes: 3

Related Questions