Why does summary() output label factor levels differently for models, depending on previous commands?

Question

I am working on a large R project to perform different analyses of a common dataset. I have built up several individual scripts for each analysis, as well as high-level scripts to call each one in sequence. Each script starts by calling an init.R script that wipes the memory ( rm(list=ls(all=TRUE)) ).

I have recently discovered that summary() (and, I think coef()) produces different output, depending on the order of the scripts. In scripts that fit models using lm() or gam() (mgcv package), if these are run first, in a "fresh" R session, the summary() output lists factors with the full labels.

However, if I run other scripts first, which use simple nested aov() functions and produce some graphs and other output using some other packages, then re-run the previously-mentioned scripts, summary() instead produces output with factor levels labeled using numbers (the 'coded' values, not the actual factor level labels).

This is not something I can easily "reproduce" using a minimal working example, unfortunately, because I haven't quite pinpointed where in my scripts this behaviour changes. I have confirmed a few things in quick tests:

memory is cleared in between scripts using rm(list=ls()), so there shouldn't be anything in memory causing this change.
summary() itself does not change: the model-fitting functions actually produce slightly different output (as confirmed with all.equal() ), which is even more disturbing. Saved objects produced when running the scripts in a different order reliably produce the same output whenever they are loaded, but that output differs depending on the order of scripts used to generate the fitted model objects (even though memory is cleared in between each script).
- Depending on the order of scripts, summary( lm(...) ) also outputs different estimates for model terms, but the same Residuals summary, R^2, and overall F-test. Very bizarre.
I can not recover the default (desired) behaviour by removing packages loaded in prior scripts. Does the order of loading packages matter?
default behaviour is restored after quitting and re-starting R
- nothing in this answer seemed to fix the problem: Reset R instance

Ideally, I would like my project to be able to reproduce all results and output by simply source()ing each script in turn, but this strange 'bug' (in my code - I'm not blaming this on R) means that the output is not consistent and depends on the order :(

Is there anything other than objects or packages that stays in memory that could alter the way model-fitting functions work, or store factor levels in data-frames that are passed in?

EDIT

I realized the answer to the above question was the contrasts option (see below). New question:

How can you reset options() to the default settings, i.e. to the values used when R starts up? The 'factory default' is options(contrasts=c("contr.treatment","contr.poly"))) but I'm wondering if there is a way to restart to the internal defaults (in case they aren't 'factory fresh'.

Why does summary() output label factor levels differently for models, depending on previous commands?

EDIT

Answers (1)

Related Questions