user3743235
user3743235

Reputation: 159

MATLAB - Find repeated values in a column and extract values from that row

I have a matrix with repeated values in the 1st column, for example:

A = [
1  34  463;
2  45  684;
2  23  352;
3  31  256;
1  46  742;
4  25  234]

Using A, I am looking to extract data from the 2nd column for each value in the 1st column to output B. Where repetition occurs for a value in the 1st column, the corresponding values in the 2nd column are put in an additional output column (NaNs can be used where no repetition occurs). For example:

B = [
1  34  46;
2  45  23;
3  31  NaN;
4  25  NaN]

(The 1st column in B is not necessary, but is included here for clarification)

I have attempted to use a combination of find functions, if statements and loops, but without success. Ideally, a successful approach would also be efficient, as the actual dataset is large.

I use version R2012a. Please advise.

Upvotes: 2

Views: 909

Answers (2)

Santhan Salai
Santhan Salai

Reputation: 3898

You could use cell-arrays for these kind of problem. Cell Arrays are used when the length of all columns or all rows are not equal. Each row/column could have different size. They don't need padding to make them all equal in size.

One approach using accumarray

[~,~,idx] = unique(A(:,1));
outC = accumarray(idx,A(:,2),[],@(x) {x.'})    %//'
%// If you want the outputs in sorted order use the following code instead
%// outC = accumarray(idx,A(:,2),[],@(x) {sort(x).'})

outC = 

[1x2 double]
[1x2 double]
[        31]
[        25]

You could access each cell using the syntax like this outC{1}

>> outC{1}

ans =

46    34

If you want to view the whole matrix at once, you could use celldisp function

>> celldisp(outC)

outC{1} =
46    34

outC{2} =
23    45

outC{3} =
31

outC{4} =
25

If you want to get the output as NaN padded matrix instead of cell-array, you could do something like this (after you have obtained outC above):

Approach using bsxfun and cellfun

lens = cellfun(@numel,outC);
maxSize = max(lens);
out = nan(maxSize,numel(outC));
mask = bsxfun(@le,(1:maxSize).',lens(:).')
out(mask) = horzcat(outC{:});
out = out.'

Output:

out =

46    34
23    45
31   NaN
25   NaN

If you use alternative approach (output sorted) to find the outC, the result would be:

out =

34    46
23    45
31   NaN
25   NaN

Upvotes: 3

Divakar
Divakar

Reputation: 221614

This would be one approach -

[~,~,idx] = unique(A(:,1),'stable') %// Find IDs for each element from col-1
[~,sorted_idx] = sort(idx)  %// Get sorted IDs
grp_vals = A(sorted_idx,2)  %// Get second column elements grouped together
grp_lens = accumarray(idx,1)%// Find Group lengths

%// Create a mask for a 2D array where the ones are places where grouped 
%// elements are to be put.
mask = bsxfun(@le,[1:max(grp_lens)]',grp_lens(:).') 

%// Create a nan filled array of same shape as mask and finally fill masked 
%// places with grouped elements. Transpose at the end to get desired output.
out = nan(size(mask))
out(mask) = grp_vals
out = out.'

Sample run -

>> A,out
A =
     1    34   463
     2    45   684
     0    23   352
    -3    31   256
     1    46   742
     4    25   234
     1    12    99
    -3   -20    56
out =
    34    46    12
    45   NaN   NaN
    23   NaN   NaN
    31   -20   NaN
    25   NaN   NaN

Upvotes: 3

Related Questions