Find pattern only until first occurrence of another pattern (or: how to remove random effects from a formula of mixed effects models)

Question

I want to extract information from model formulas, especially I want to remove random effects to just get the "fixed effects part" from mixed models (lme4-notation).

To do this, I search for the last + in the formula before a paranthesis ( is found. Everything until the + must be the "fixed" part of the formula. This works fine for models with fixed effects predictors / variables.

However, for null-models (intercept-only in the fixed effects), there might be no +, e.g. if the formula is Reaction ~ (Days | Subject). In this case, I check if there is no +-sign. But this does not work for models with multiple random parts. In the below examples, the grepl()for f2 should return FALSE, but returns TRUE, because the + is found for the second opening paranthesis in the random parts.

My question: How can I stop checking for + after the first (, so that a possible second or third random effect term is ignored? The goal is for the below example that the grepl()-commands return FALSE, FALSE, TRUE, TRUE.

f1 <- "Reaction ~ (1 + Days | Subject)"
f2 <- "Reaction ~ (1 | mygrp/mysubgrp) + (1 | Subject)"
f3 <- "Reaction ~ x1 + x2 + (1 + Days | Subject)"
f4 <- "Reaction ~ x1 + x2 + (1 | mygrp/mysubgrp) + (1 | Subject)"

# works!
grepl("\+(\s)*$(.*)$", f1) # should return FALSE
#> [1] FALSE

# fails...
grepl("\+(\s)*$(.*)$", f2) # should return FALSE
#> [1] TRUE

# works!
grepl("\+(\s)*$(.*)$", f3) # should return TRUE
#> [1] TRUE

# works!
grepl("\+(\s)*$(.*)$", f4) # should return TRUE
#> [1] TRUE

Oliver · Accepted Answer

This is not really answering your question from a RE perspective (for which there likely is an answer), but if your goal is to extract the random effects and/or fixed effect formula's you might gain more from looking at the source code of glFormula and lFormula form the lme4 package itself. As they are creating both the design matrix X and Z for fixed and random effects respectively, they will have to extract their individual parts at some points.

For example, to extract the fixed effects the function nobars and RHSForm are used:

library(lme4)
f1 <- Reaction ~ (1 + Days | Subject)
f2 <- Reaction ~ (1 | mygrp/mysubgrp) + (1 | Subject)
f3 <- Reaction ~ x1 + x2 + (1 + Days | Subject)
f4 <- Reaction ~ x1 + x2 + (1 | mygrp/mysubgrp) + (1 | Subject)
(f1FixedEffects <- nobars(lme4:::RHSForm(f1)) #note the triple 'lme4:::'. RHSForm is not exported to the public environment.
[1] 1
(f2FixedEffects <- nobars(lme4:::RHSForm(f2))
[1] 1
(f1FixedEffects <- nobars(lme4:::RHSForm(f3))
x1 + x2
(f1FixedEffects <- nobars(lme4:::RHSForm(f4))
x1 + x2

If the desire is to extract the entire formula you can use

lme4:::RHSForm(f1) <- nobars(lme4:::RHSForm(f1)
f1
Reaction ~ 1

or similar (thanks to AkselA for his comment)

nobars(f1)
Reaction ~ 1

for the fixed effects.

Note that i converted your string formulas to formulas. This could also just be done with 'as.formula()'

Find pattern only until first occurrence of another pattern (or: how to remove random effects from a formula of mixed effects models)

Answers (2)

Related Questions