Calculate the average Pointwise Information of a query that has more than two strings?

Question

Lets say we have a query that constitutes the following 4 strings w1,w2,w3 and w4

The pointwise mutual information(PMI) between two string is denoted as: p(w_i,w_j) = log(p(w_i,w_j)/(p(w_i)*p(w_j)))

To find the average PMI, one would naturally calculate the PMI for all the pairs and average it. But what do we do in cases where for the pairs in consideration, there are no common documents?

Ex: Lets say w1 and w2 have no common documents, which in turn means that p(w1,w2) = 0 and a PMI of Infinity. How do we take an average then? Do we neglect the pairs whose PMI is infinity? If we do neglect such pairs, then what should we do in cases where none of the strings in the query would have any common documents?

Calculate the average Pointwise Information of a query that has more than two strings?

Answers (1)

Related Questions