Michiel Karrenbelt
Michiel Karrenbelt

Reputation: 143

MATLAB find uniques not in other list

I have a cell array of identifiers, and a cell array of names that corresponds with these identifiers (both string). The identifiers are unique, however the names are not. How can I obtain the unique identifiers that do not have unique names?

ismember and unique alone do not seem to be sufficient

Upvotes: 0

Views: 52

Answers (2)

Benoit_11
Benoit_11

Reputation: 13945

From what I understand you're trying to find the IDs corresponding to non-unique names. Based on Andy's answer here, we can do the following:

Let's create a cell array D containing the names and Ids:

IDs = {'number1','number2','number3','number4','number5'}.' 
names = {'first','second','third','first','fifth'} .'

D = [names IDs]

The next 4 lines are similar to Andy's solution. The idea is to first find the duplicate entries (here the names), then remove them from the cell array. What is left is the entries that are repeated. You can then easily find the IDs corresponding to them.

Andy's code:

[~,uniqueIdx] =unique(D(:,1)) % Find the indices of the unique strings
duplicates = D(:,1) % Copy the original into a duplicate array
duplicates(uniqueIdx) = [] % remove the unique strings, anything left is a duplicate
duplicates = unique(duplicates) % find the unique duplicates

And now to find the Ids:

NonUniqueRows = strcmp(duplicates,D(:,1))

Out = D(NonUniqueRows,2)

Here Out is

Out = 

    'number1'
    'number4'

Upvotes: 0

Luis Mendo
Luis Mendo

Reputation: 112699

Let

identifiers = {'a', 'bb', 'ccc', 'eeeee'};
names = {'a', 'bb', 'a', 'dddd', 'ffffff'};

To obtain the elements of identifiers that appear in names more than once, or that never appear:

[ii, jj] = ndgrid(1:numel(names), 1:numel(identifiers)); %// all combinations of indices
matches = sum(strcmp(names(ii), identifiers(jj)), 1); %// sum of matches for each identifier
result_0 = identifiers(matches==0); %// these identifiers never appear
result_1p = identifiers(matches>1); %// these identifiers appear more than once

In this example,

result_0 = 
    'ccc'    'eeeee'
result_1p = 
    'a'

Upvotes: 1

Related Questions