Reputation: 12618
I'm dealing with very big data in matlab and used to store this data in matrices. I used to store my data by row, but since Matlab stores data column-wise I understand that reshaping my matrix so I index by column makes handling faster. Here's an example of what I mean:
general parameters
nbr_channels = 20;
nbr_samples_per_channel = 3200000;
fake_data = randn(1, nbr_samples_per_channel);
ROI = 1200000 : 2800000;
assign data by row
data = nan(nbr_channels, nbr_samples_per_channel);
tic;
for j = 1 : nbr_channels
data(j, 1:nbr_samples_per_channel) = fake_data;
end;
toc;
% Elapsed time is 1.476525 seconds.
return data from row matrix
tic;
for j = 1 : nbr_channels
bla = data(j, ROI);
end;
toc;
% Elapsed time is 0.572162 seconds.
return all data from row matrix
tic;
for j = 1 : nbr_channels
bla = data(j, :);
end;
toc;
% Elapsed time is 0.589489 seconds.
assign data by column
data = nan(nbr_samples_per_channel, nbr_channels);
tic;
for j = 1 : nbr_channels
data(1:nbr_samples_per_channel, j) = fake_data;
end;
toc;
% Elapsed time is 0.299682 seconds.
return data from column matrix
tic;
for j = 1 : nbr_channels
bla = data(ROI, j);
end;
toc;
% Elapsed time is 0.260824 seconds.
return all data from column matrix
tic;
f or j = 1 : nbr_channels
bla = data(:, j);
end;
toc;
% Elapsed time is 0.092983 seconds.
Summary Part1:
As we can see, accessing the data by column reduces the handling times by at least a factor of two!
But I don't understand why cells are even more efficient! Have a look at this example:
assign data by cell
data = cell(1, nbr_samples_per_channel);
tic;
for j = 1 : nbr_channels
data{j} = fake_data;
end;
toc;
% Elapsed time is 0.000013 seconds.
return data from cell array
tic;
for j = 1 : nbr_channels
bla = data{j}(ROI);
end;
toc;
% Elapsed time is 0.260294 seconds.
return all data from cell array
tic;
for j = 1 : nbr_channels
bla = data{j};
end;
toc;
% Elapsed time is 0.000022 seconds.
%%
Summary Part2:
This is orders of magnitude faster than what I have shown in Part1.
Question 1
why are access times to data stored in cells shorter than in matrices?
Question 2
Working with matrices is generally easier than with cells, because with a matrix on can do
my_matrix(100:20000, 1:3)
but with cells I can't do this (as far as I know). Any alternatives on how to return specific elements from multiple cells at the same time?
Upvotes: 4
Views: 160
Reputation: 1187
You are seeing different times because you are not doing equivalent things. To compare two of your cases:
assign data by cell
You are creating a cell array row vector, and stuffing in a long double vector to each cell
Each loop iteration results in 1 assignment of a vector into a single slot in a cell array
There are 'nbr_samples_per_channel' number of assignments being done.
assign data by column
you are going through the columns of a matrix, and assigning a vector to each element in each column
Each loop iteration, regardless of the shorthand colon : notation you used, resolves into many assignments. data(1:nbr_samples_per_channel, j) means 'nbr_samples_per_channel' assignments PER iteration.
overall, you are doing 'nbr_samples_per_channel' * 'nbr_channels' total assignments.
To make my point, just re-write the loop without the colon operator to visualize all the assignments.
for j = 1 : nbr_channels
n = length(fake_data)
data(1, j) = fake_data(1);
data(2, j) = fake_data(1);
... etc ...
data(n - 1, j) = fake_data(n-1);
data(n, j) = fake_data(n);
end
So, to conclude, you are comparing two different things, so you can't say one is really faster than the other, because they are not equivalent.
If you just loop over a double array and a cell array, and do regular assignments....
%% Setup samples and pre-allocate
numberOfSamples = 100000;
doubleData = nan(numberOfSamples, 1);
cellData = cell(numberOfSamples, 1);
randomValues = rand(numberOfSamples, 1);
%% Assign N number of values to a double array
tic;
for idx = 1 : numberOfSamples
data(numberOfSamples) = randomValues(idx);
end
doubleTime = toc;
%% Assign N number of values to a cell array
tic;
for idx = 1 : numberOfSamples
cellData{numberOfSamples} = randomValues(idx);
end
cellTime = toc;
disp(sprintf('Double Array: %f seconds', doubleTime));
disp(sprintf('Cell Array: %f seconds', cellTime));
You end up with:
Double Array: 0.006073 seconds
Cell Array: 0.032966 seconds
For your second question, is this what you are trying to do?
>> bigCell = {1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16}
bigCell =
[ 1] [ 2] [ 3] [ 4]
[ 5] [ 6] [ 7] [ 8]
[ 9] [10] [11] [12]
[13] [14] [15] [16]
>> subCell = bigCell(1:2, 3:4)
subCell =
[3] [4]
[7] [8]
Notice that subcell is still a cell. By using ( ) 's and not { } 's to access the cell, you preserve it being a cell.
Upvotes: 3