Reputation: 969
I'm trying to implement a linear mixed model using the following code. is.ill is a binary response variable and I'm trying to predict this for each gene in my dataset.
for (gene in 1:122) {
nlme.m1 <- lme(is.ill ~ pulse + age + gender + data[,gene], random = ~1|pulse)
lmm_analysis = summary(nlme.m1)
}
However this code fails right now due to the error:
Error in model.frame.default(formula = ~is.ill + pulse + :
invalid type (list) for variable 'data'
data is 331 x 122 but I can't see what the problem is?
Here's the output of:
str(data)
'data.frame': 331 obs. of 122 variables:
$ A0000004: num 0.616 0.543 0.486 0.326 0.378 ...
$ A0000005: num 1.96 2.05 1.98 1.94 2 ...
$ A0000007: num 0.659 0.588 0.769 0.628 0.464 ...
$ A0000002: num 0.1761 0.0864 0.1492 0.1673 0.2455 ...
$ A0000008: num 0.185 0.117 0.79 0.117 0.117 ...
$ A0000010: num 0.143 0.49 0.49 0.587 0.243 ...
$ A0000011: num 2.2 1.5 1.34 1.27 1.77 ...
$ A0000013: num -0.25 -0.304 -0.467 -0.372 -0.544 ...
$ A0000018: num 2.16 2.06 1.97 1.99 1.87 ...
$ A0000021: num 2.19 2.27 2.28 2.26 2.48 ...
$ A0000022: num 1.068 0.958 1.049 0.966 1.061 ...
$ A0000023: num 0.3945 0.0414 0.1072 0.0607 0.1367 ...
$ A0000025: num 1.111 1.111 1.401 0.831 0.799 ...
$ A0000027: num 1.44 1.36 1.47 1.29 1.56 ...
$ A0000030: num 0.0569 -0.1107 0.0755 -0.0964 0.1875 ...
$ A0000032: num 0.155 0.267 0.885 0.713 0.769 ...
$ A0000035: num 0.0828 -0.2596 -0.068 -0.1244 0.0607 ...
$ A0000036: num 0.4654 0.0294 0.1139 -0.1379 0.4281 ...
$ A0000038: num -0.762 -0.595 -0.332 -0.374 -0.165 ...
$ A0000039: num 0.415 1.955 1.314 2.049 0.415 ...
$ A0000040: num -1.248 -0.73 -0.62 -0.662 -0.917 ...
$ A0000043: num 2.08 1.99 2.07 2.12 2.16 ...
$ A0000045: num 0.999 -0.833 -0.699 -0.907 -0.327 ...
$ A0000046: num 0.615 0.592 0.674 0.539 0.58 ...
$ A0000048: num 2.32 2.04 2.46 2.02 2.36 ...
$ A0000050: num 0.4409 0.2625 0.3139 0.0607 0.238 ...
$ A0000052: num 1.71 1.66 1.94 1.8 1.92 ...
$ A0000053: num 2.24 2.08 2.08 2.03 2.16 ...
$ A0000059: num -0.804 -0.804 -0.804 -0.804 -0.804 ...
$ A0000060: num 2.48 2.69 2.67 2.55 2.57 ...
$ A0000061: num 0.712 0.539 0.697 0.624 0.79 ...
$ A0000062: num 1.52 1.34 1.29 1.23 1.26 ...
$ A0000064: num 0.961 1.654 1.425 1.477 2.121 ...
$ A0000065: num 1.61 1.94 1.8 1.47 1.61 ...
$ A0000067: num -1.85 -1.93 -2.04 -1.93 -2.24 ...
$ A0000068: num -0.64 -0.352 -0.228 -0.147 -0.314 ...
$ A0000069: num 1.146 0.822 1.265 1.173 1.021 ...
$ A0000070: num -0.58 -0.839 0.173 0.983 -0.475 ...
$ A0000074: num 2.92 2.53 3.26 2.55 2.78 ...
$ A0000075: num -0.0306 0.0492 0.1139 0.0492 0.2201 ...
$ A0000076: num -0.368 -0.62 -0.194 -0.387 0.1 ...
$ A0000077: num 1.83 1.87 1.63 1.83 3.13 ...
$ A0000078: num 0.863 0.604 0.814 0.362 0.819 ...
$ A0000079: num 0.477 0.204 0.554 0.767 0.617 ...
$ A0000082: num 2.1 2.04 2.05 2.1 2.18 ...
$ A0000084: num 0.3598 0.3243 0.1553 0.2553 0.0453 ...
$ A0000085: num 2.33 2.3 1.65 2.24 1.14 ...
$ A0000086: num 1.62 1.65 1.53 1.78 1.21 ...
$ A0000088: num 0.276 0.236 0.19 0.124 0.204 ...
$ A0000091: num 1.12 1.03 1.31 1.12 1.16 ...
$ A0000094: num 1.73 1.84 1.8 1.73 1.69 ...
$ A0000095: num 1.72 2.05 2.01 2.02 1.77 ...
$ A0000096: num -0.0395 -0.1463 -0.1062 -0.1463 -0.1931 ...
$ A0000097: num 2.82 2.38 2.48 2.48 2.4 ...
$ A0000100: num 2.42 2.18 2.16 2.21 2.39 ...
etc
Upvotes: 1
Views: 693
Reputation: 226881
In general R 'prefers' it if all the data for a model are included within the same data frame. Different packages vary in how fussy they are about this -- I think your approach might work with lm
, for example -- but it looks like lme
is a bit fussier. Here's an approach that might work:
Make up some testing data:
set.seed(101)
y <- rnorm(100)
x <- runif(100)
f <- gl(10,10)
z <- data.frame(zz1=rnorm(100),zz2=rnorm(100)) ## equivalent to your 'data'
Combine all the data into a single data frame:
d <- data.frame(x,y,f,z)
Construct base model: nothing extra
m0 <- lme(y~x,random=~1|f,data=d)
Put together a formula that looks like ~ . + NEWNAME
, and use it to update the model:
n <- names(z)[1]
m1 <- update(m0,reformulate(c(".",n),response="."))
Apply this approach to all of the columns (genes)
fitList <- lapply(names(z),
function(n) update(m0,reformulate(c(".",n),response=".")))
I'm concerned about a couple of other things with your model specification:
is.ill
sounds like a binary response; have you considered fitting a generalized linear mixed model (with glmer
from the lme4
package)?pulse
in this case) included in the model as both a fixed effect and a grouping variable is a bad idea, although there is an exception where (1) the variable is continuous and (2) there are repeated measurements for many observed values of the variable (in this case the fixed effect will explain linear trend and the random effect will explain random variation around the line of groups of observations).Upvotes: 1