user18772311
user18772311

Reputation: 29

" Error: Invalid grouping factor specification"

I'm trying to create a mixed effects model for some data I'm analyzing. It previously worked as a fixed effects before I decided to change one of the variables (countryfactor) to a random effects (random intercept) variable. The issue is that when I run it I get the following message:

"Error: Invalid grouping factor specification, countryfactor".

I've seen on other posts that this is usually an issue with there being NA entries, but I've checked all the variables in my model and none have any NA entries.

Does anyone know what might be causing this error message? Posted the model code below.

    glmer(
    formula = 
      as.numeric(wheezing_InD) ~ 
      as.factor(mainfuel) + 
      age_InD + 
      as.factor(gender_InD) +
      as.factor(school_level_InD3) +
      as.factor(enough_money_InD) +
      as.factor(cooking_location_InD) +
      as.factor(other_smokers_household_InD) +
      as.factor(AnyCondition) + 
      as.factor(owned_items_electricity_connection_R) +
      as.factor(HealthAdviceFull) +
      (1|countryfactor),
    family=poisson(link="log"),
    data = Data46)

update

Tried with a simpler model, with just the first 20 rows and the following 3 columns.

glmer(
    formula = 
      as.numeric(wheezing_InD) ~ 
      age_InD + 
      (1|countryfactor),
    family=poisson(link="log"),
    data = Data46)

Still have the same error code. Here is a sample of the first 20 rows with these 3 variables, using dput:

structure(list(wheezing_InD = c("No", "Yes", "Yes", "No", "No", 
"No", "No", "No", "Yes", "No", "No", "No", "No", "Yes", "No", 
"No", "Yes", "Yes", "Yes", "No"), age_InD = c(55L, 24L, 23L, 
30L, 40L, 43L, 37L, 38L, 18L, 23L, 28L, 33L, 27L, 54L, 23L, 23L, 
42L, 48L, 31L, 18L), countryfactor = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("cameroon", "ghana", "kenya"), class = "factor")), row.names = c(NA, 
20L), class = "data.frame")

Have also attached the str version too:

'data.frame':   20 obs. of  3 variables:
 $ wheezing_InD : chr  "No" "Yes" "Yes" "No" ...
 $ age_InD      : int  55 24 23 30 40 43 37 38 18 23 ...
 $ countryfactor: Factor w/ 3 levels "cameroon","ghana",..: 1 1 1 1 1 1 1 1 1 1 ...

Upvotes: 0

Views: 1263

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226851

If this is really what your data look like (i.e. the response variable wheezing_InD is a character vector and not a factor) then as.numeric(wheezing_inD) will convert the entire response vector to NAs ... admittedly lme4 could provide a more informative error message here ...

Binomial responses can be specified in most R modeling functions very flexibly (I would say too flexibly).

For the ‘binomial’ and ‘quasibinomial’ families the response can be specified in one of three ways:

  1. As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
  2. As a numerical vector with values between ‘0’ and ‘1’, interpreted as the proportion of successful cases (with the total number of cases given by the ‘weights’).
  3. As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.

Let's consider your options:

  • wheezing_inD alone will give an error (it's a character, which isn't in the allowed set)
  • as.factor(wheezing_inD) or factor(wheezing_inD) should work fine (option 1 above: the model will estimate the proportion of "Yes" values, since R will use alphabetical order to make "No" the first level and "Yes" the second
  • as.numeric(factor(wheezing_inD))-1 is OK, as.numeric(as.factor()) converts ("No", "Yes") to (1,2) and subtracting 1 gives (0,1). (This is option 2, we don't need "weights" because we only have 1 'trial' per observation (Bernoulli/binomial with n=1).

Option 3 is really only relevant for binomial data with N>1.

as_numeric(factor(wheezing_inD)) seems weird to me as it will result in (1,2) responses, which should give you an error?

Upvotes: 1

Related Questions