Reputation: 6052
I have faced the following problem:
A
and B
a
, b
) (a
belongs to set A
, where b
belongs to set B
) there a probability pij
is known in advance. It represents the probability (certainty level) that a
matches b
, or in other words, how closely a
matches b
(and vice-versa, because pij
== pji
).a
, b
) which describe the matchingA simple practical example in which such algorithm is required is described below (this is not actually the problem I am solving!):
a
, b
) we run a pattern matcher to determine the probability that letter a
written by person A
represents letter b
wrote by person B
. This gives us the
probability matrix which expresses some kind of similarity correlation
for each pair of letters (a
, b
)A
wrote,
we need to find the corresponding
letter written by person B
Current approach:
I am wondering if I could just assign weights which are proportional to the logarithm of certainty level / probability that element a
from set A
matches element b
from set B
and then run maximum weighted bipartite matching to find the maximum sum. The logarithm is because I want to maximize the total probability of multiple matching, and since single matches (represented as pairs of matched elements a
- b
) form a chain of events, which is a product of probabilities, by taking the logarithm we converts this to a sum of probabilities, which is then easily maximized using an algorithm for weighted bipartite matching, such as Hungarian algorithm. But I somehow doubt this approach would ensure the best matching in terms of statistical expected maximum.
After searching a bit, the closest problem I found was a two-stage stochastic maximum weighted matching problem, which is NP-hard, but I actually need some kind of "one-stage" stochastic maximum weighted matching problem.
Upvotes: 0
Views: 662
Reputation: 21
I wonder if you can use MaxFlow/MinCut. I can't prove it's optimal at the moment, but your problem may be NP-hard anyway. You can use MF/MC to find a perfect matching when you have a bipartite graph with V=(A,B) by creating a source connected to all nodes in A with a weight of 1 and a sink connected to all nodes in B with weight 1. I'm proposing you make the weights of edges that cross from A to B be the probabilities you mentioned above. What do you think?
Upvotes: 1