James Draper
James Draper

Reputation: 5310

Why is scipy.stats.ttest_ind throwing a new RuntimeWarning when comparing nans?

I'm working with some pretty huge but sparsely populated pandas DataFrames. I use scipy.stats.ttest_ind to make comparisons of some of these columns which contain many nans. I recently updated to Anaconda 4.2.12 and now when use scipy.stats.ttest_ind I get the run time error seen in the example below.

import numpy as np
import scipy
case1 = case2 = np.linspace(np.nan,np.nan,5)
scipy.stats.ttest_ind(case1,case2)

>>>output: 
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1748: RuntimeWarning: invalid value encountered in greater
    cond1 = (scale > 0) & (x > self.a) & (x < self.b)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1748: RuntimeWarning: invalid value encountered in less
    cond1 = (scale > 0) & (x > self.a) & (x < self.b)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1749: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)

So the function runs and I can use the output just like before I updated the only difference is now I get this run time warning.

If I drop all of the nans in my DataFrames then ttest_ind works just fine. But I don't want to do that because I need to maintain the structure of the DataFrames.

Does anyone know why this is happening? Is there anything that I can do besides just keep on using the function ignoring the warning or writing some kind of hacked up work around function?

Upvotes: 1

Views: 1643

Answers (2)

Danting
Danting

Reputation: 11

I just find an option:

nan_policy='omit'

so try this:

t,p = ttest_ind(cls_up['cause_pct'],cls_down['cause_pct'],nan_policy='omit')

Hope it can be helpful in your case too!

Upvotes: 1

piRSquared
piRSquared

Reputation: 294506

When I do

np.array([np.nan, -1]) < 0

enter image description here

However, I can wrap it in a pandas series and let pandas supress the warning

pd.Series([np.nan, -1]).lt(0).values

array([False,  True], dtype=bool)

Upvotes: 2

Related Questions