Reputation: 377
I am working on a large R project to perform different analyses of a common dataset. I have built up several individual scripts for each analysis, as well as high-level scripts to call each one in sequence. Each script starts by calling an init.R
script that wipes the memory ( rm(list=ls(all=TRUE))
).
I have recently discovered that summary()
(and, I think coef()
) produces different output, depending on the order of the scripts. In scripts that fit models using lm()
or gam()
(mgcv
package), if these are run first, in a "fresh" R session, the summary()
output lists factors with the full labels.
However, if I run other scripts first, which use simple nested aov()
functions and produce some graphs and other output using some other packages, then re-run the previously-mentioned scripts, summary()
instead produces output with factor levels labeled using numbers (the 'coded' values, not the actual factor level labels).
This is not something I can easily "reproduce" using a minimal working example, unfortunately, because I haven't quite pinpointed where in my scripts this behaviour changes. I have confirmed a few things in quick tests:
rm(list=ls())
, so there shouldn't be anything in memory causing this change.summary()
itself does not change: the model-fitting functions actually produce slightly different output (as confirmed with all.equal()
), which is even more disturbing. Saved objects produced when running the scripts in a different order reliably produce the same output whenever they are loaded, but that output differs depending on the order of scripts used to generate the fitted model objects (even though memory is cleared in between each script).
summary( lm(...) )
also outputs different estimates for model terms, but the same Residuals summary, R^2, and overall F-test. Very bizarre.Ideally, I would like my project to be able to reproduce all results and output by simply source()
ing each script in turn, but this strange 'bug' (in my code - I'm not blaming this on R) means that the output is not consistent and depends on the order :(
Is there anything other than objects or packages that stays in memory that could alter the way model-fitting functions work, or store factor levels in data-frames that are passed in?
I realized the answer to the above question was the contrasts option (see below). New question:
How can you reset options() to the default settings, i.e. to the values used when R starts up? The 'factory default' is
options(contrasts=c("contr.treatment","contr.poly")))
but I'm wondering if there is a way to restart to the internal defaults (in case they aren't 'factory fresh'.
Upvotes: 1
Views: 1436
Reputation: 377
After comparing outputs, I realized I was looking at different contrasts, and remembered that the 'offending' script changed the contrasts options from the default:
options(contrasts=c("contr.sum","contr.poly"))
So, that explains all the confusion above. Hope that saves somebody else some hair-pulling. New question:
How can you reset options()
to the default settings, i.e. to the values used when R starts up?
Upvotes: 1