Rasmus Faber
Rasmus Faber

Reputation: 49677

Finding smallest set of criteria for uniqueness

I have a collection of objects with properties. I want to find the simplest set of criteria that will specify exactly one of these objects (I do not care which one).

For example, given {a=1, b=1, c=1}, {a=1, b=2, c=1}, {a=1, b=1, c=2}, specifying b==2 (or c==2) will give me an unique object.

Likewise, given {a=1, b=1, c=1}, {a=1, b=2, c=2}, {a=1, b=2, c=1}, specifying b==2 and c==2 (or b==1 && c==1 or b==2 && c==1) will give me an unique object.

This sounds like a known problem, with a known solution, but I haven't been able to find the correct formulation of the problem to allow me to Google it.

Upvotes: 6

Views: 181

Answers (4)

Per
Per

Reputation: 2634

The freedom of choosing the target is sort of unusual. If the target is specified, then this is essentially the set cover problem. Here's two corresponding instances side by side.

A={1,2,3} B={2,4} C={3,4} D={4,5}

0: {a=0, b=0, c=0, d=0}  # separate 0 from the others
1: {a=1, b=0, c=0, d=0}
2: {a=1, b=1, c=0, d=0}
3: {a=1, b=0, c=1, d=0}
4: {a=0, b=1, c=1, d=1}
5: {a=0, b=0, c=0, d=1}

While set cover is NP-hard, however, your problem has an O(mlog n + O(1) poly(n)) algorithm where m is the number of attributes and n is the number of items (the optimal set of criteria has size at most log n), which makes it rather unlikely that an NP-hardness proof is forthcoming. I'm reminded of the situation with the Junta problem (basically the theoretical formulation of feature selection).

Upvotes: 2

Jacob
Jacob

Reputation: 34621

Your problem can be defined as follows:

1 1 1 -> A
1 2 1 -> B
1 1 2 -> C
.
.

where 1 1 1 is called the feature vector and A is the object class. You can then use decision trees (with pruning) to find a set of rules to classify objects. So, if your objective is to automatically decide the set of criteria to identify object A then, you can observe the path on the decision tree which leads to A.

If you have access to MATLAB, it is really easy to obtain a decision tree for your data.

Upvotes: 0

OSH
OSH

Reputation: 2937

It is indeed a known problem in AI - feature selection. There are many algorithms for doing this Just Google "feature selection" "artificial intelligence".

The main issue is that when the samples set is large, you need to use some sort of heuristics in order to reach a solution within a reasonable time.


Feature Selection in Data Mining

The main idea of feature selection is to choose a subset of input variables by eliminating features with little or no predictive information.

Upvotes: 4

Lieven Keersmaekers
Lieven Keersmaekers

Reputation: 58491

I don't know how easily this could be translated into an algoritm but using SQL, which is already set based, it could go like this

  • construct a table with all possible combinations of columns from the input table
  • select all combinations that appear equal to the amount of records present in the input table as distinct combinations.

SQL Script

;WITH q (a, b, c) AS (
  SELECT '1', '1', '1'
  UNION ALL SELECT '1', '2', '2'
  UNION ALL SELECT '1', '2', '1'
  UNION ALL SELECT '1', '1', '2'
)
SELECT  col
FROM    (
          SELECT val = a, col = 'a'  FROM q
          UNION ALL SELECT b, 'b' FROM q
          UNION ALL SELECT c, 'c' FROM q
          UNION ALL SELECT a+b, 'a+b' FROM q
          UNION ALL SELECT a+c, 'a+c' FROM q
          UNION ALL SELECT b+c, 'b+c' FROM q
          UNION ALL SELECT a+b+c, 'a+b+c' FROM q
        ) f
GROUP BY
        col        
HAVING
        COUNT(DISTINCT (val)) = (SELECT COUNT(*) FROM q)

Upvotes: 0

Related Questions