MATLAB - Find repeated values in a column and extract values from that row

Question

I have a matrix with repeated values in the 1st column, for example:

A = [
1  34  463;
2  45  684;
2  23  352;
3  31  256;
1  46  742;
4  25  234]

Using A, I am looking to extract data from the 2nd column for each value in the 1st column to output B. Where repetition occurs for a value in the 1st column, the corresponding values in the 2nd column are put in an additional output column (NaNs can be used where no repetition occurs). For example:

B = [
1  34  46;
2  45  23;
3  31  NaN;
4  25  NaN]

(The 1st column in B is not necessary, but is included here for clarification)

I have attempted to use a combination of find functions, if statements and loops, but without success. Ideally, a successful approach would also be efficient, as the actual dataset is large.

I use version R2012a. Please advise.

Santhan Salai · Accepted Answer

You could use cell-arrays for these kind of problem. Cell Arrays are used when the length of all columns or all rows are not equal. Each row/column could have different size. They don't need padding to make them all equal in size.

One approach using accumarray

[~,~,idx] = unique(A(:,1));
outC = accumarray(idx,A(:,2),[],@(x) {x.'})    %//'
%// If you want the outputs in sorted order use the following code instead
%// outC = accumarray(idx,A(:,2),[],@(x) {sort(x).'})

outC = 

[1x2 double]
[1x2 double]
[        31]
[        25]

You could access each cell using the syntax like this outC{1}

>> outC{1}

ans =

46    34

If you want to view the whole matrix at once, you could use celldisp function

>> celldisp(outC)

outC{1} =
46    34

outC{2} =
23    45

outC{3} =
31

outC{4} =
25

If you want to get the output as NaN padded matrix instead of cell-array, you could do something like this (after you have obtained outC above):

Approach using bsxfun and cellfun

lens = cellfun(@numel,outC);
maxSize = max(lens);
out = nan(maxSize,numel(outC));
mask = bsxfun(@le,(1:maxSize).',lens(:).')
out(mask) = horzcat(outC{:});
out = out.'

Output:

out =

46    34
23    45
31   NaN
25   NaN

If you use alternative approach (output sorted) to find the outC, the result would be:

out =

34    46
23    45
31   NaN
25   NaN

MATLAB - Find repeated values in a column and extract values from that row

Answers (2)

Related Questions