Reputation: 163
I am running into an out of memory issue when running simulations in MATLAB.
To give an easy example, let's say I have a MATLAB table / matrix / collection of vectors that look like this:
id | t | var
----+---+-----
1 | 1 | 100
1 | 2 | 150
2 | 2 | 200
2 | 3 | 90
2 | 4 | 980
where id
denotes individuals, t
denotes time periods, and var
is a numerical variable.
I need to sum the different values of var
for a given individual id
over time t
. The easiest way I could find was to do run the following:
idx = sparse(id == id');
sumvar = idx*sumvar;
which produces the desired result (a vector with the same length as var
where each element is the sum for a given id
).
id | t | var | sumvar
----+---+-----+--------
1 | 1 | 100 | 250
1 | 2 | 150 | 250
2 | 2 | 200 | 1270
2 | 3 | 90 | 1270
2 | 4 | 980 | 1270
The problem is that computing idx
is extremely computer intensive, and with vectors of the order of 150,000 my computer runs out of memory.
One potential solution is to use the following code:
len = length(id);
idx = sparse(len,len);
for i = 1:len
idx(id == id(i),:) = 1;
end
But that seems to be quite slow.
I feel this is a problem someone else may have already faced. Is there anything that could be both non-computing intensive but also fast enough?
Upvotes: 2
Views: 86
Reputation: 30047
For your example table T
:
id | t | var
----+---+-----
1 | 1 | 100
1 | 2 | 150
2 | 2 | 200
2 | 3 | 90
2 | 4 | 980
You can use varfun
a = varfun( @sum, T, 'GroupingVariables', 'id', 'InputVariables', 'var' )
Result:
a =
2×3 table
id GroupCount sum_var
__ __________ _______
1 2 250
2 3 1270
Upvotes: 0
Reputation: 112659
You can try accumarray
, as follows. Let your data be
id = [1 1 2 2 2].';
var = [100 150 200 90 980].';
Assuming id
always contains integer entries starting at 1
:
result = accumarray(id, var);
gives
result =
250
1270
If id
is arbitrary, use:
[~, ~, id_int] = unique(id);
result = accumarray(id_int, var);
If you need the results repeated as with your code, add:
result_repeated = result(id_int);
Upvotes: 3
Reputation: 6863
You could try the following, still using a loop, but in a bit more efficient way by iterating over only the unique id
's.
id = [1 1 2 2 2].';
var = [100 150 200 90 980].';
unique_ids = unique(id); % get the unique ids
sum_var = NaN(size(var)); % init the sum_var vector
for k = unique_ids.' % loop over the ids
idx = find(id == k); % find indices per id
sum_var(idx) = sum(var(idx)); % sum per id
end
Or, if you just need a vector with the sum per id
:
unique_ids = unique(id); % get the unique idx
sum_var = NaN(size(unique_ids)); % init the sum_var vector
for k = 1:numel(unique_ids)
idx = find(id == unique_ids(k)); % find indices per id
sum_var(k) = sum(var(idx)); % sum per id
end
Update: It can also be done without find
, by using the indices per unique element that unique
can return. Assuming your data is sorted by id
you can do the following:
[unique_ids, start_idx, ~] = unique(id); % get the unique idx, and the first occuring idx per id
sum_var = NaN(size(unique_ids));
start_idx = [start_idx; numel(var)+1]; % append total number of elements+1 for last summation in loop below
for k = 1:numel(unique_ids)
ids = start_idx(k):start_idx(k+1)-1; % indices in table for specific id
sum_var(k) = sum(var(ids));
end
Upvotes: 1