normalized_mutual_info_score in sklearn giving negative values or values greater than 1

Question

I'm trying to calculate the normalized mutual information between two 256*256 image labels, flatten into an array.

In sklearn's documentation it was clear that the function normalized_mutual_info_score should only output values between 0 and 1.

However I realise that when comparing lists with lots of elements sometimes it gives me negative values or values larger than 1.

Is it an overflow/underflow problem? (I realise that I got a more realistic answer if I change average_method to "arithmetic", "min", or "max" but not sure which one would be the most appropriate to use in my case.)

Using sklearn 0.20.0, I'll provide a synthetic example to reproduce the problem:

metrics.normalized_mutual_info_score([0]*100001, [0]*100000 + [1])
metrics.normalized_mutual_info_score([0]*110001, [0]*110000 + [1])

I expect the answer below to be 0 but instead I got 7.999 and -7.999 respectively.

Venkatachalam · Accepted Answer

As you have mentioned that setting the average_method gives sensible values.

normalized_mutual_info_score([0]*100001, [0]*100000 + [1],average_method='arithmetic')
#3.166757680223739e-14

I would suggest using the arithmetic since that it going to be the default value in next version 0.22, reference.

normalized_mutual_info_score in sklearn giving negative values or values greater than 1

Answers (1)

Related Questions