Tyler Durden
Tyler Durden

Reputation: 11558

Algorithm to iteratively discover a boolean rule defining a vector condition

I am interested in finding an algorithm to iteratively discover an boolean rule which defines a condition of a vector. For example, let's say the vector is all the letters in a book and the rule purports to tell you whether the book was written by James Joyce, but we don't know what the rule is, we want to discover it. The rule engine will always answer True or False for any submitted vector. So, for example, if we submit a vector array containing all the characters in "Ulysses", then the engine might respond "True", if the rule is a good one.

So, imagine we split "Ulysses" into two halves, A and B, and submit each half to the engine separately. The engine answers True to A, but False to B. From this we can infer that whatever the rule is looking for is only found in A. So, now we divide A into two halves again, A1 and A2. The engine this time says False for both. Now, we can infer there must be an AND condition in the rule, and one of the atoms of the AND condition must be in A1 and the other in A2. For example, the rule might be: "if the words 'meatjuice' and 'carracarracarra' are in the vector return True. This is consistent with our test results because "meatjuice" is in the first quarter of the book, and "carracarracarra" is in the second quarter.

By successively dividing our text and re-submitting to the engine we can eventually discover the hidden rule the engine is using.

I suspect that there already exists an algorithm to do this, but I do not know what it is called or how to go about finding it.

Upvotes: 2

Views: 71

Answers (1)

David Eisenstat
David Eisenstat

Reputation: 65498

I would describe this problem as learning monotone DNF formulas using membership queries. Angluin (Queries and Concept Learning, 1988) gave an algorithm similar to yours, but it uses equivalence queries (i.e., find an example on which the current hypothesis doesn't work) as well as membership queries. The problem otherwise is that finding that last term may take quite a while. For example, suppose that the rule is

   (A1 && B1) || (A2 && B1) || ... || (An && B1)
|| (A1 && B2) || (A2 && B2) || ... || (An && B2)
|| (B1 && B2).

The hypothesis consisting of the first two lines only differs from the rule with respect to exactly one out of 2^(n + 2) inputs.

Upvotes: 1

Related Questions