Reputation: 3822
I'm a bit confused on how to set priors for multiple predictors for the following model:
require(rstanarm)
wi_prior <- normal(0, sd(train$attendance))
SEED <- 101
fmla <- attendance ~ (1 + W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 | franchID)
baylm <- stan_glmer(fmla,
data = train,
family = "gaussian",
algorithm = "sampling",
adapt_delta = .95,
prior_intercept = wi_prior, seed = SEED)
Here is the first observation in train, per request.
train <- structure(list(franchID = structure(25L, .Label = c("ANA", "ARI",
"ATL", "BAL", "BOS", "CHC", "CHW", "CIN", "CLE", "COL", "DET",
"FLA", "HOU", "KCR", "LAD", "MIL", "MIN", "NYM", "NYY", "OAK",
"PHI", "PIT", "SDP", "SEA", "SFG", "STL", "TBD", "TEX", "TOR",
"WSN"), class = "factor"), yearID = 1999L, name = "San Francisco Giants",
park = "3Com Park", attendance = 2078399L, W = 86L, W1 = 89L,
W2 = 90L, W3 = 68L, WCWin1 = FALSE, WCWin2 = FALSE, WCWin3 = FALSE,
DivWin1 = FALSE, DivWin2 = TRUE, DivWin3 = FALSE, LgWin1 = FALSE,
LgWin2 = FALSE, LgWin3 = FALSE, WSWin1 = FALSE, WSWin2 = FALSE,
WSWin3 = FALSE), .Names = c("franchID", "yearID", "name",
"park", "attendance", "W", "W1", "W2", "W3", "WCWin1", "WCWin2",
"WCWin3", "DivWin1", "DivWin2", "DivWin3", "LgWin1", "LgWin2",
"LgWin3", "WSWin1", "WSWin2", "WSWin3"), row.names = c(NA, -1L
), class = "data.frame")
Upvotes: 3
Views: 653
Reputation: 4980
You can specify a prior for coefficients on K predictors by passing a vector of length K to one of the supported distributions for priors. For example, if K = 4 you could do
wi_prior2 <- normal(location = c(0, 1, -2, 5))
You could also pass a vector of scales and / or a different family than normal
. Then, you would call stan_glmer
with prior = wi_prior2
. If you do
wi_prior2 <- normal(location = 0)
then the same prior would be used for all K common coefficients.
However, in your case I suspect that fmla
is mistaken. You typically also want to include most, if not all, of those predictors outside the lme4-style parenthetical expression to allow common effects across all levels of franchID
. Thus, fmla
would become
fmla <- attendance ~ W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 + (1 + W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 | franchID)
If you only include the part in parentheses, then you are assuming the coefficients on these variables are exactly zero in the population and only deviate from zero in subpopulations defined by the levels of franchID
. So, there would not be an opportunity to put prior distributions on their coefficients.
The prior on the group-wise deviations from the common coefficients is conditionally multivariate normal with mean vector zero and a somewhat complicated but unknown covariance structure. This is explained in more detail in help(priors, package = "rstanarm")
.
Upvotes: 2