Reputation: 159
I have a matrix with repeated values in the 1st column, for example:
A = [
1 34 463;
2 45 684;
2 23 352;
3 31 256;
1 46 742;
4 25 234]
Using A
, I am looking to extract data from the 2nd column for each value in the 1st column to output B
. Where repetition occurs for a value in the 1st column, the corresponding values in the 2nd column are put in an additional output column (NaNs can be used where no repetition occurs). For example:
B = [
1 34 46;
2 45 23;
3 31 NaN;
4 25 NaN]
(The 1st column in B
is not necessary, but is included here for clarification)
I have attempted to use a combination of find functions, if
statements and loops, but without success. Ideally, a successful approach would also be efficient, as the actual dataset is large.
I use version R2012a. Please advise.
Upvotes: 2
Views: 909
Reputation: 3898
You could use cell-arrays
for these kind of problem. Cell Arrays are used when the length of all columns or all rows are not equal. Each row/column could have different size. They don't need padding to make them all equal in size.
One approach using accumarray
[~,~,idx] = unique(A(:,1));
outC = accumarray(idx,A(:,2),[],@(x) {x.'}) %//'
%// If you want the outputs in sorted order use the following code instead
%// outC = accumarray(idx,A(:,2),[],@(x) {sort(x).'})
outC =
[1x2 double]
[1x2 double]
[ 31]
[ 25]
You could access each cell using the syntax like this outC{1}
>> outC{1}
ans =
46 34
If you want to view the whole matrix at once, you could use celldisp
function
>> celldisp(outC)
outC{1} =
46 34
outC{2} =
23 45
outC{3} =
31
outC{4} =
25
If you want to get the output as NaN
padded matrix instead of cell-array, you could do something like this (after you have obtained outC
above):
Approach using bsxfun
and cellfun
lens = cellfun(@numel,outC);
maxSize = max(lens);
out = nan(maxSize,numel(outC));
mask = bsxfun(@le,(1:maxSize).',lens(:).')
out(mask) = horzcat(outC{:});
out = out.'
Output:
out =
46 34
23 45
31 NaN
25 NaN
If you use alternative approach (output sorted) to find the outC
, the result would be:
out =
34 46
23 45
31 NaN
25 NaN
Upvotes: 3
Reputation: 221614
This would be one approach -
[~,~,idx] = unique(A(:,1),'stable') %// Find IDs for each element from col-1
[~,sorted_idx] = sort(idx) %// Get sorted IDs
grp_vals = A(sorted_idx,2) %// Get second column elements grouped together
grp_lens = accumarray(idx,1)%// Find Group lengths
%// Create a mask for a 2D array where the ones are places where grouped
%// elements are to be put.
mask = bsxfun(@le,[1:max(grp_lens)]',grp_lens(:).')
%// Create a nan filled array of same shape as mask and finally fill masked
%// places with grouped elements. Transpose at the end to get desired output.
out = nan(size(mask))
out(mask) = grp_vals
out = out.'
Sample run -
>> A,out
A =
1 34 463
2 45 684
0 23 352
-3 31 256
1 46 742
4 25 234
1 12 99
-3 -20 56
out =
34 46 12
45 NaN NaN
23 NaN NaN
31 -20 NaN
25 NaN NaN
Upvotes: 3