Reputation: 303
Is there a general way to ignore NaNs in statsmodels?
I am using statsmodels' AnovaRM
function to run repeated measures ANOVAs on various data sets. There are missing values in different columns for different rows. When running AnovaRM
, it obviously returns nan
for F- and p-values.
I have tried
aovrm = AnovaRM(df3, 'RT', 'id', within=['iv'], missing = 'drop')
as suggested in Ignoring missing values in multiple OLS regression with statsmodels, however this does not seem to work for AnovaRM
.
So far I have simply excluded the subjects with missing data points, but that's a) really not the point and b) is simply not feasible for many data sets.
Upvotes: 2
Views: 2241
Reputation: 22897
From the AnovaRM docstring
"This implementation currently only supports fully balanced designs."
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/stats/anova.py#L413 (AnvaRM has not yet been added to online the documentation.)
So the general missing option of the models is not available for AnovaRM. This is mainly because of the restrictive assumptions that underlie repeated measures ANOVA.
As alternative the general recommendation in the literature is to use mixed effects models, which is available in MixedLM in statmodels. Other options would be to use GEE or fixed effects with OLS.
Upvotes: 2