Reputation: 441
Background. I have numbers 1 to 20 (white digits on black background) that can appear on screen, and I wish to identify these numbers. As they cannot be simply copy-pasted, I will be comparing the position of the white pixels of the number on-screen with a list of positions of white pixels of all 20 numbers. However, each number can have a large number of pixels, and comparing all of those pixels may not be necessary to identify that number. Hence, I'd like to make the least number of comparisons as possible.
Algorithmic question: I have multiple sets with elements that are unique within each set, but may not be unique across all sets. How can I find the smallest possible subset of each set such that each subset is unique?
Example 1: Let A = {1, 2}, and B = {3, 4}. The smallest subset of A and B would be {1} and {3} (or {2} and {4}), as those subsets are unique to each original set, and are as small as possible.
Example 2: Let A = {1, 2, 3, 4}, B = {1, 2, 3, 5}, C = {1, 2, 4, 5}. Possible smallest subsets are {3, 4}, {3, 5}, {4, 5}. If any element were removed from any of those subsets, then that subset can also belong to a different set. E.g. removing 4 from the first subset will leave {3}, which makes it ambiguous if the {3} identifies the first or the second set.
Upvotes: 1
Views: 1932
Reputation: 11
I recently wrote a Python package that is intended to solve your algorithmic question efficiently: https://github.com/alussana/TrieSUS
I stumbled upon an equivalent problem, and I was quite surprised not to find a name for this algorithmic question. I could only find brute-force approaches that involve enumerating and comparing powersets to find the solution(s) for each set - which is very inefficient, and especially slow when considering sets for which a solution doesn't exist.
My algorithm uses a trie data structure and a series of linear time operations to first greatly reduce the problem size, and to then transform it into the equivalent of a set cover problem, the solution of which is extracted using OR-Tools's constrained programming solver. More information about the algorithm and its performance can be found in the repository.
Upvotes: 0
Reputation: 165
It's a solution with O(n^3) time complexity, and O(n) memory complexity. (If I am not wrong)
function isElement(elem, s) {
return s.includes(elem)
}
function isId(id, sets) {
let setsWithSuchElementsNumber = 0
for (const s of sets) {
if (id.every((e) => isElement(e, s))) {
setsWithSuchElementsNumber++
}
}
return setsWithSuchElementsNumber === 1
}
function getSetId(s, sets) {
const count = {}
const elements = []
for (const elem of s) {
if (count[elem] == null) {
elements.push(elem)
}
count[elem] = 0
}
for (const otherSet of sets) {
for (const e of elements) {
if (isElement(e, otherSet)) {
count[e]++
}
}
}
elements.sort((first, second) => {
return Math.sign(count[first] - count[second])
})
for (let idSize = 1; idSize <= elements.length; idSize++) {
const possibleId = elements.slice(0, idSize)
if (isId(possibleId, sets)) {
return possibleId
}
}
return null
}
const getSetIds = (sets) => {
return sets.map((s) => getSetId(s, sets))
}
const res = getSetIds([
[1, 2, 3, 4],
[1, 2, 3, 5],
[1, 2, 4, 5],
])
console.log(res.join(' '))
Upvotes: 1