user2623706
user2623706

Reputation: 527

Stumped finding an algorithm for this problem re: finding set that does not belong

Given an array of sets find the one that does not belong:
example: [[a,b,c,d], [a,b,f,g], [a,b,h,i], [j,k,l,m]]
output: [j,k,l,m]

We can see above that the first three sets have a common subset [a,b] and the last one does not. Note: There may be a case where the outlier set does have elements contained in the input group. In this case we have to find the set that has the least in common with the other sets.

I have tried iterating over the input list and keeping a count for each character (in a hash).
In a second pass, find which set has the smallest total.
In the example above, the last set would have a sum of counts of 4:
j*1 + k*1 + l*1 + m*1.

I'd like to know if there are better ways to do this.

Upvotes: 2

Views: 501

Answers (2)

jaem
jaem

Reputation: 101

You shouldn't be looking for the smallest sum of count of elements. It is dependent on the size of the set. But if you substract the size of the set from the sum, it's 0 only if the set is disjoint from all the others. Another option, is to look at the maximum of the count of its elements. If the maximum is one on a set, then they only belong to the set.
There are many functions you can use. As the note states:

Note: There may be a case where the outlier set does have elements contained in the input group. In this case we have to find the set that has the least in common with the other sets.

The previous functions are not optimal. A better function would count the number of shared elements. Set the value of an element to 1 if it's in multiple sets and 0 if it appears only once.

Upvotes: 4

Prune
Prune

Reputation: 77837

Your description:

find the set that has the least in common with the other sets

Doing this as a general application would require computing similarity with each individual pair of sets; this does not seem to be what you describe algorithmically. Also, it's an annoying O(n^2) algorithm.

I suggest the follow amendment and clarification

find the set that least conforms to the mean of the entire list of sets.

This matches your description much better, and can be done in two simple passes, O(n*m) where you have n sets of size m.

The approach you outlined does the job quite nicely: count the occurrences of each element in all of the sets, O(nm). Then score each set according to the elements it contains, also O(nm). Keep track of which element has the lowest score.


For additional "accuracy", you could sort the scores and look for gaps in the scoring -- this would point out multiple outliers.


If you do this in Python, use the Counter class for your tally.

Upvotes: 5

Related Questions