Victoria Assad
Victoria Assad

Reputation: 11

Multiple Regression - Error in model.frame.default variable lengths differ

I'm trying to run a multiple regression with 3 independent variables, and 3 dependent variables. The question is based on how water quality influences plankton abundance in and between 3 different locations aka guzzlers. With water quality variables being pH, phosphates, and nitrates. Dependent/response variables would be the plankton abundance in each 3 locations.

Here is my code:

model1 <- lm(cbind(Abundance[Guzzler.. == 1], Abundance[Guzzler.. == 2], 
                   Abundance[Guzzler.. == 3]) ~ Phospates + Nitrates + pH, 
             data=WQAbundancebyGuzzler)

And this is the error message I am getting:

Error in model.frame.default(formula = cbind(Abundance[Guzzler.. == 1],  : 
  variable lengths differ (found for 'Phospates')    

I think it has to do with how my data is set up but I'm not sure how to go about changing this to get the model to run. What I'm trying to see is how these water quality variables are affecting the abundance in the different locations and how they vary between. So it doesn't seem quite logical to try multiple models which was my only other thought.

Here is the output from dput(head(WQAbundancebyGuzzler)):

    structure(list(ï..Date = structure(c(2L, 4L, 1L, 3L, 5L, 2L), .Label = c("11/16/2018", 
"11/2/2018", "11/30/2018", "11/9/2018", "12/7/2018"), class = "factor"), 
    Guzzler.. = c(1L, 1L, 1L, 1L, 1L, 2L), Phospates = c(2L, 
    2L, 2L, 2L, 2L, 1L), Nitrates = c(0, 0.3, 0, 0.15, 0, 0), 
    pH = c(7.5, 8, 7.5, 7, 7, 8), Air.Temp..C. = c(20.8, 25.4, 
    20.9, 16.8, 19.4, 27.4), Relative.Humidity... = c(62L, 31L, 
    41L, 59L, 59L, 43L), DO2.Concentration..mg.L. = c(3.61, 4.48, 
    3.57, 5.65, 2.45, 5.86), Water.Temp..C. = c(14.1, 11.5, 11.8, 
    13.9, 11.1, 17.8), Abundance = c(98L, 43L, 65L, 55L, 54L, 
    29L)), .Names = c("ï..Date", "Guzzler..", "Phospates", "Nitrates", 
"pH", "Air.Temp..C.", "Relative.Humidity...", "DO2.Concentration..mg.L.", 
"Water.Temp..C.", "Abundance"), row.names = c(NA, 6L), class = "data.frame")

Upvotes: 1

Views: 2280

Answers (2)

fujiu
fujiu

Reputation: 501

I think the problem here is more theoretical: You say that you have three dependent variables that you want to enter into a multiple linear regression. However, at least in classic linear regression, there can only be one dependent variable. There might be ways around this, but I think in your case, one dependent variable works just fine: It's `Abundance´. Now you you have sampled three different locations: One solution to account for this could be to just enter the location as a categorical independent variable. So I would propose the following model:

# Make sure that Guzzler is not treated as numeric
WQAbundancebyGuzzler$Guzzler <- as.factor(WQAbundancebyGuzzler$Guzzler)

# Model with 4 independent variables
model1 <- lm(Abundance ~ Guzzler + Phospates + Nitrates + pH, 
             data=WQAbundancebyGuzzler)

It's probably also wise to think about possible interactions here, especially between Guzzler and the other independent variables.

Upvotes: 2

jay.sf
jay.sf

Reputation: 73712

The reason for your error is, that you try to subset only "Abundance" but not the other variables. So as a result their lenghts differ. You need to subset the whole data, e.g.

lm(Abundance ~ Phospates + Nitrates + pH, 
   data=WQAbundancebyGuzzler[WQAbundancebyGuzzler$Abundance %in% c(1, 2, 3), ])

With given head(WQAbundancebyGuzzler)

lm(Abundance ~ Phospates + Nitrates + pH, 
   data=WQAbundancebyGuzzler[WQAbundancebyGuzzler$Abundance %in% c(29, 43, 65), ])

results in

# Call:
#   lm(formula = Abundance ~ Phospates + Nitrates + pH, data = WQAbundancebyGuzzler
#   [WQAbundancebyGuzzler$Abundance %in% 
#       c(29, 43, 65), ])
# 
# Coefficients:
#   (Intercept)    Phospates     Nitrates           pH  
#         -7.00        36.00       -73.33           NA  

Upvotes: 0

Related Questions