Ignore NaNs in Python's statsmodels

Question

Is there a general way to ignore NaNs in statsmodels?

I am using statsmodels' AnovaRM function to run repeated measures ANOVAs on various data sets. There are missing values in different columns for different rows. When running AnovaRM, it obviously returns nan for F- and p-values.

I have tried

aovrm = AnovaRM(df3, 'RT', 'id', within=['iv'], missing = 'drop')

as suggested in Ignoring missing values in multiple OLS regression with statsmodels, however this does not seem to work for AnovaRM.

So far I have simply excluded the subjects with missing data points, but that's a) really not the point and b) is simply not feasible for many data sets.

Josef · Accepted Answer

From the AnovaRM docstring

"This implementation currently only supports fully balanced designs."

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/stats/anova.py#L413 (AnvaRM has not yet been added to online the documentation.)

So the general missing option of the models is not available for AnovaRM. This is mainly because of the restrictive assumptions that underlie repeated measures ANOVA.

As alternative the general recommendation in the literature is to use mixed effects models, which is available in MixedLM in statmodels. Other options would be to use GEE or fixed effects with OLS.

Ignore NaNs in Python's statsmodels

Answers (1)

Related Questions

Ignore NaNs in Python&#39;s statsmodels

Answers (1)

Related Questions

Ignore NaNs in Python's statsmodels