Reputation: 33
I have a vector of information, say:
Info = [10, 20, 10, 30, 500, 400, 67, 350, 20, 105, 15];
and another a vector of IDs, say:
Info_IDs = [1, 2, 1, 4, 2, 3, 4, 1, 3, 1, 2];
I would like to obtain a matrix that is defined as follows:
Result =
10 10 350 105
20 500 15 0
400 20 0 0
30 67 0 0
Where every row shows the values of Info
corresponding to a different ID. As seen in this short example, the number of values per ID
differs in each row.
I'm working with large amounts of data (Info
is 1x1000000 and Info_IDs
is 1x25000), so
I would like to achieve this Result
matrix preferably without loops. One way I was thinking about is to compute the histogram per ID and store this info (therefore Result
would not contain the original info, but the binned info).
Thank you all in advance for your input.
Upvotes: 1
Views: 88
Reputation: 112679
If you don't mind to have zeros in between:
number_Ids = 4; % set as required
aux = (bsxfun(@eq,Info_IDs,(1:number_Ids).'));
sol = bsxfun(@(x,y) x.*y,Info,aux)
This gives, in your example:
10 0 10 0 0 0 0 350 0 105 0
0 20 0 0 500 0 0 0 0 0 15
0 0 0 0 0 400 0 0 20 0 0
0 0 0 30 0 0 67 0 0 0 0
Or, if you do mind the zeros but not the order, you can sort
this result by rows:
sol2 = sort(sol,2,'descend')
which gives
350 105 10 10 0 0 0 0 0 0 0
500 20 15 0 0 0 0 0 0 0 0
400 20 0 0 0 0 0 0 0 0 0
67 30 0 0 0 0 0 0 0 0 0
EDIT: the order of the non-zero entries can be preserved using the same trick as here
Upvotes: 0
Reputation: 32930
Here's a vectorized solution that should be both memory efficient and work fast even on large matrices:
%// Pad data with zero values and add matching IDs
len = histc(Info_IDs, 1:max(Info_IDs));
padlen = max(len) - len;
padval = zeros(1, sum(padlen));
padval(cumsum([1, padlen(1:end - 1)])) = 1;
Info = [Info, zeros(1, sum(padlen))];
Info_IDs = [Info_IDs, cumsum(padval) + 1];
%// Group data into rows
Result = accumarray(Info_IDs(:), Info, [], @(x){x}).';
Result = [Result{:}].';
The second step can also be performed as follows:
%// Group data into rows
[sorted_IDs, sorted_idx] = sort(Info_IDs);
Result = reshape(Info(sorted_idx), numel(len), []).';
%// Sample input data
Info = [10 20 10 30 500 400 67 350 20 105 15];
Info_IDs = [1 2 1 4 2 3 4 1 3 1 2];
%// Pad data with zero values and add matching IDs
len = histc(Info_IDs, 1:max(Info_IDs));
padlen = max(len) - len;
padval = zeros(1, sum(padlen));
padval(cumsum([1, padlen(1:end - 1)])) = 1;
Info = [Info, zeros(1, sum(padlen))]
Info_IDs = [Info_IDs, cumsum(padval) + 1]
%// Group data into rows
Result = accumarray(Info_IDs(:), Info, [], @(x){x}).';
Result = [Result{:}].';
The result is:
Result =
10 10 350 105
20 500 15 0
400 20 0 0
30 67 0 0
Upvotes: 1
Reputation: 45752
I don't know about not using loops but this is pretty fast:
Result = [];
n = 4; %i.e. number of classes
for c = 1:n
row = Info(Info_IDs == c);
Result (c, 1:size(row,2)) = row;
end
And if speed really is an issue then you can preallocate as Result = zeros(4, sum(Info_IDs == mode(Info_IDs)))
Upvotes: 0