Why does np.percentile return NaN for high percentiles?

Question

This code:

print len(my_series)
print np.percentile(my_series, 98)
print np.percentile(my_series, 99)

gives:

14221  # This is the series length
1644.2  # 98th percentile
nan  # 99th percentile?

Why does 98 work fine but 99 gives nan?

Niels Henkens · Accepted Answer

np.percentile treats nan's as very high numbers. So the high percentiles will be in the range where you will end up with a nan. In your case, between 1 and 2 percent of your data will be nan's (98th percentile will return you a number (which is not actually the 98th percentile of all the valid values) and the 99th will return you a nan).

To calculate the percentile without the nan's, you can use np.nanpercentile()

So:

print(np.nanpercentile(my_series, 98))
print(np.nanpercentile(my_series, 99))

Edit: In new Numpy version, np.percentile will return nan if nan's are present, so making this problem directly apparent. np.nanpercentile still works the same. `

Why does np.percentile return NaN for high percentiles?

Answers (2)

Related Questions