Reputation: 585
I'm trying to calculate the normalized mutual information between two 256*256 image labels, flatten into an array.
In sklearn's documentation it was clear that the function normalized_mutual_info_score should only output values between 0 and 1.
However I realise that when comparing lists with lots of elements sometimes it gives me negative values or values larger than 1.
Is it an overflow/underflow problem? (I realise that I got a more realistic answer if I change average_method to "arithmetic", "min", or "max" but not sure which one would be the most appropriate to use in my case.)
Using sklearn 0.20.0, I'll provide a synthetic example to reproduce the problem:
metrics.normalized_mutual_info_score([0]*100001, [0]*100000 + [1])
metrics.normalized_mutual_info_score([0]*110001, [0]*110000 + [1])
I expect the answer below to be 0 but instead I got 7.999 and -7.999 respectively.
Upvotes: 3
Views: 1459
Reputation: 16966
As you have mentioned that setting the average_method gives sensible values.
normalized_mutual_info_score([0]*100001, [0]*100000 + [1],average_method='arithmetic')
#3.166757680223739e-14
I would suggest using the arithmetic
since that it going to be the default value in next version 0.22, reference.
Upvotes: 1