Piwie
Piwie

Reputation: 33

Group values into rows

I have a vector of information, say:

Info = [10, 20, 10, 30, 500, 400, 67, 350, 20, 105, 15];

and another a vector of IDs, say:

Info_IDs = [1, 2, 1, 4, 2, 3, 4, 1, 3, 1, 2];

I would like to obtain a matrix that is defined as follows:

Result =
    10    10   350   105
    20   500    15     0
   400    20     0     0
    30    67     0     0

Where every row shows the values of Info corresponding to a different ID. As seen in this short example, the number of values per ID differs in each row.

I'm working with large amounts of data (Info is 1x1000000 and Info_IDs is 1x25000), so I would like to achieve this Result matrix preferably without loops. One way I was thinking about is to compute the histogram per ID and store this info (therefore Result would not contain the original info, but the binned info).

Thank you all in advance for your input.

Upvotes: 1

Views: 88

Answers (3)

Luis Mendo
Luis Mendo

Reputation: 112679

If you don't mind to have zeros in between:

number_Ids = 4; % set as required
aux = (bsxfun(@eq,Info_IDs,(1:number_Ids).'));
sol = bsxfun(@(x,y) x.*y,Info,aux)

This gives, in your example:

10     0    10     0     0     0     0   350     0   105     0
 0    20     0     0   500     0     0     0     0     0    15
 0     0     0     0     0   400     0     0    20     0     0
 0     0     0    30     0     0    67     0     0     0     0

Or, if you do mind the zeros but not the order, you can sort this result by rows:

sol2 = sort(sol,2,'descend')

which gives

350   105    10    10     0     0     0     0     0     0     0
500    20    15     0     0     0     0     0     0     0     0
400    20     0     0     0     0     0     0     0     0     0
 67    30     0     0     0     0     0     0     0     0     0

EDIT: the order of the non-zero entries can be preserved using the same trick as here

Upvotes: 0

Eitan T
Eitan T

Reputation: 32930

Here's a vectorized solution that should be both memory efficient and work fast even on large matrices:

%// Pad data with zero values and add matching IDs
len = histc(Info_IDs, 1:max(Info_IDs));
padlen = max(len) - len;
padval = zeros(1, sum(padlen));
padval(cumsum([1, padlen(1:end - 1)])) = 1;
Info = [Info, zeros(1, sum(padlen))];
Info_IDs = [Info_IDs, cumsum(padval) + 1];

%// Group data into rows
Result = accumarray(Info_IDs(:), Info, [], @(x){x}).';
Result = [Result{:}].';

The second step can also be performed as follows:

%// Group data into rows
[sorted_IDs, sorted_idx] = sort(Info_IDs);
Result = reshape(Info(sorted_idx), numel(len), []).';

Example

%// Sample input data
Info = [10 20 10 30 500 400 67 350 20 105 15];
Info_IDs = [1 2 1 4 2 3 4 1 3 1 2];

%// Pad data with zero values and add matching IDs
len = histc(Info_IDs, 1:max(Info_IDs));
padlen = max(len) - len;
padval = zeros(1, sum(padlen));
padval(cumsum([1, padlen(1:end - 1)])) = 1;
Info = [Info, zeros(1, sum(padlen))]
Info_IDs = [Info_IDs, cumsum(padval) + 1]

%// Group data into rows
Result = accumarray(Info_IDs(:), Info, [], @(x){x}).';
Result = [Result{:}].';

The result is:

Result =
    10    10   350   105
    20   500    15     0
   400    20     0     0
    30    67     0     0

Upvotes: 1

Dan
Dan

Reputation: 45752

I don't know about not using loops but this is pretty fast:

Result = [];
n = 4; %i.e.  number of classes
for c = 1:n 
    row = Info(Info_IDs == c);
    Result (c, 1:size(row,2)) = row;
end

And if speed really is an issue then you can preallocate as Result = zeros(4, sum(Info_IDs == mode(Info_IDs)))

Upvotes: 0

Related Questions