Javier Hernando
Javier Hernando

Reputation: 368

Looping/sapply through nlme function

I am trying to execute a loop with mixed-model effects with response variable changing. I came from here and here. I should say that I have tried sthg creating a function and then sapply or lapply (wihtout success)

I provide a small dataset (really small) just to represent my original database (much larger and similar to those of longitudinal studies)

data<- structure(list(paciente = structure(c(6134, 6099, 6457, 6164, 
6470, 6323, 6550, 6082, 6476, 6044, 6509, 6539, 6234, 6555, 6383, 
6127, 6507, 6513, 6486, 6080, 6101, 6007, 6023, 6516, 6001, 6198, 
6510, 6530, 6351, 6181), label = "Paciente", format.spss = "F6.0"), 
    edad_s1 = structure(c(70, 63, 61, 71, 67, 59, 63, 69, 67, 
    67, 67, 72, 65, 72, 63, 65, 60, 64, 56, 63, 57, 62, 72, 60, 
    72, 63, 72, 68, 66, 71), label = "Edad", format.spss = "F3.0"), 
    sexo_s1 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 
    2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 
    2L, 1L, 2L, 1L, 2L, 2L), .Label = c("Hombre", "Mujer"), label = "Sexo", class = "factor"), 
    time = c(2, 1, 2, 1, 0, 0, 1, 0, 2, 1, 1, 0, 1, 2, 1, 2, 
    1, 2, 0, 1, 1, 0, 2, 1, 0, 2, 1, 2, 2, 0), grupo_int_v00 = structure(c(1L, 
    1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 
    2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), .Label = c("A", 
    "B"), label = "Grupo de intervención", class = "factor"), 
    peso1 = c(108, 80.4, 95, 75, 92.6, 90, 82.2, 94.4, 78, 71.3, 
    75.1, 83.5, 87.1, 63, 73, 98.5, 90.2, 81.3, 93.4, 79.8, 114.3, 
    110.9, 81.5, 88.5, 82.4, 88.3, 90, 73, 79, 94.8), cintura1 = c(127, 
    100.5, 103.5, 108, 115, 114.5, 95.5, 115, 101, 98, 99, 108.5, 
    105, 99, 104, 126, 114.2, 99, 110, 104.5, 120, 126, 111.5, 
    102, 117, 110, 125, 100, 104, 123), tasis2_e = c(156, 129, 
    131, 138, 167, 138, 115, 146, 119, 148, 130, 144, 115, 122, 
    135, 139, 128, 119, 138, 115, 138, 151, 151, NA, 137, 147, 
    124, 168, 152, 156), tadias2_e = c(70, 63, 80, 67, 76, 81, 
    57, 68, 69, 69, 68, 78, 61, 71, 54, 77, 63, 63, 92, 73, 80, 
    88, 84, NA, 79, 76, 62, 90, 87, 89), p17_total = c(10, 10, 
    5, 9, 9, 7, 15, 11, 6, 12, 11, 4, 9, 14, 9, 9, 11, 14, 6, 
    5, 10, 10, 9, 13, 12, 7, 11, 12, 7, 4), geaf_tot = c(1986.01, 
    1286.71, 1230.77, 1510.49, 839.16, 2144.52, 5361.31, 1678.32, 
    4055.94, 2601.4, 3363.64, 3076.92, 5342.66, 2769.23, 2601.4, 
    1693.24, 4055.94, 3146.85, 3916.08, 6405.59, 2442.89, 671.33, 
    867.13, 1585.08, 3153.85, 3188.81, 7986.01, 839.16, 7552.45, 
    2937.06), glucosa = c(127, 97, 95, 102, 119, 113, 109, 105, 
    93, 167, 85, 108, 122, 112, 113, 120, 100, 108, 100, 86, 
    129, 136, 98, 97, 130, 125, 109, 102, NA, 181), albumi = c(4.47, 
    4.82, 4.78, 4.22, 4.59, 4.5, 4.33, 4.87, 4.83, 4.98, 4.23, 
    4.77, 4.76, 4.98, 4.18, 4.51, 4.72, 4.87, 4.77, 4.61, 4.55, 
    4.77, 4.6, 4.59, 4.25, 4.71, 4.47, 4.54, NA, 4.63), coltot = c(157, 
    191, 276, 248, 248, 217, 187, 301, 173, 230, 258, 238, 231, 
    181, 183, 243, 223, 195, 237, 245, 164, 145, 199, 234, 178, 
    192, 201, 198, NA, 159), hdl = c(39, 50, 57, 59, 49, 44, 
    60, 98, 52, 73, 58, 44, 58, 60, 48, 46, 73, 58, 39, 47, 38, 
    45, 59, 56, 72, 34, 78, 62, NA, 54), ldl_calc = c(91, 124, 
    204, 133, 155, 140, 105, 162, 91, 141, 182, 173, 155, 107, 
    83, 150, 132, 124, NA, 167, 101, 88, 121, 160, 84, 130, 112, 
    120, NA, NA), trigli = c(137, 87, 74, 282, 219, 165, 112, 
    203, 149, 78, 89, 105, 91, 71, 259, 236, 92, 63, 447, 157, 
    123, 58, 94, 90, 112, 139, 53, 80, NA, 429), hba1c = c(6.57, 
    5.82, 5.68, 5.96, 6.11, 5.73, 5.48, 5.8, 5.6, 7.8, 5.21, 
    5.73, 6.1, 5.86, 6.37, 6.27, 5.22, 5.59, 5.47, 5.95, 6.96, 
    NA, 5.47, 4.99, NA, 6.25, 5.79, 5.79, NA, 6.54), i_hucpeptide = c(NA, 
    NA, 466.64, 838.61, 847.89, 1481.03, 819.65, NA, 1298.6, 
    NA, 564.59, 544.2, 755.73, 1057.83, 957.43, NA, 957.33, 1002.34, 
    1104, NA, NA, NA, NA, 594.6, NA, 815.82, 922.08, 628.54, 
    NA, 1591.01), i_hughrelin = c(NA, NA, 410.97, 553.65, 453, 
    352.44, 527.01, NA, 328.27, NA, 1668.41, 460.06, 1072.27, 
    260.24, 749.03, NA, 1327.91, 363.79, 524.53, NA, NA, NA, 
    NA, 1051.1, NA, 143.32, 1076.49, 1565.85, NA, 607.31), i_hugip = c(NA, 
    NA, 2.67, 2.67, 2.67, 2.67, 2.67, NA, 2.67, NA, 2.67, 2.67, 
    690.74, 1165.16, 2.67, NA, 2.67, 2.67, 2.67, NA, NA, NA, 
    NA, 2.67, NA, 2.67, 2.67, 2.67, NA, 2.67), i_huglp1 = c(NA, 
    NA, 127.66, 284.34, 200.13, 59.3, 234.84, NA, 503.42, NA, 
    103.9, 14.14, 71.6, 56.41, 75.13, NA, 161.36, 124.19, 220.52, 
    NA, NA, NA, NA, 14.14, NA, 112.57, 100.52, 237.55, NA, 470.91
    ), i_huglucagon = c(NA, NA, 333.79, 649.94, 726.99, 395.38, 
    610.5, NA, 434.42, NA, 502.4, 127.62, 268.23, 10.48, 428.15, 
    NA, 716.02, 238.95, 320.32, NA, NA, NA, NA, 10.48, NA, 238, 
    487.42, 297.6, NA, 495.16), i_huinsulin = c(NA, NA, 129.24, 
    270.98, 299.75, 730.82, 267.54, NA, 616.91, NA, 121.26, 85.34, 
    224.96, 247.48, 220.75, NA, 181.85, 341.25, 551.46, NA, NA, 
    NA, NA, 133.42, NA, 263.87, 279.45, 94.78, NA, 573.14), i_huleptin = c(NA, 
    NA, 3992.49, 17806.43, 8409.76, 11511.43, 2965.92, NA, 3223.08, 
    NA, 9018.79, 1039.45, 2613.33, 2128.98, 7307.89, NA, 13492.13, 
    2883.77, 4775.98, NA, NA, NA, NA, 2602.96, NA, 2829.59, 8511.92, 
    3528.77, NA, 11487.15), i_hupai1 = c(NA, NA, 997.29, 2499.25, 
    3085.25, 1909.44, 1730.55, NA, 3333.37, NA, 1424.3, 1857.71, 
    2578.46, 2268.52, 2222.97, NA, 2722.92, 1300.69, 2732.11, 
    NA, NA, NA, NA, 1204.36, NA, 2483.08, 2289.67, 1791.79, NA, 
    6595.54), i_huresistin = c(NA, NA, 3044.48, 5774.77, 3221.72, 
    4925.57, 5170.95, NA, 3683.64, NA, 4041.32, 6771.31, 5119.11, 
    9521.7, 3328.41, NA, 5061.65, 3773.39, 3039.39, NA, NA, NA, 
    NA, 4405.17, NA, 2577.84, 3433.82, 6802.94, NA, 6461.67), 
    i_huvisfatin = c(NA, NA, 302.3, 2083.46, 2989.72, 1118.7, 
    8.64, NA, 96.03, NA, 2209.51, 8.64, 1944.37, 1415.55, 678.33, 
    NA, 4349.56, 8.64, 410.1, NA, NA, NA, NA, 117, NA, 8.64, 
    2308.8, 228.53, NA, 1766.64), col_rema = c(27, 17, 15, 56, 
    44, 33, 22, 41, 30, 16, 18, 21, 18, 14, 52, 47, 18, 13, NA, 
    31, 25, 12, 19, 18, 22, 28, 11, 16, NA, NA), homa = c(NA, 
    NA, 5.053, 11.374, 14.679, 33.985, 12.001, NA, 23.61, NA, 
    4.242, 3.793, 11.294, 11.406, 10.265, NA, 7.484, 15.167, 
    22.694, NA, NA, NA, NA, 5.326, NA, 13.574, 12.535, 3.978, 
    NA, 42.691), i_pcr = c(NA, NA, 0.41, 0.82, NA, 2.08, 0.08, 
    NA, 0.1, NA, 0.38, 0.05, 0.04, 0.35, 0.2, NA, 0.98, 0.02, 
    NA, NA, NA, NA, NA, 0.2, NA, 0.1, 0.16, 0.16, NA, 2.93)), row.names = c(NA, 
-30L), class = c("tbl_df", "tbl", "data.frame"))

Afterwards I am defining my iteration and my variables database

ex<- subset(data[, 6:30])

for (i in 1:length(ex)) {
  var_1 <- ex[,i]
  var_1 <- unlist(var_1)
  lme_1 <- lme(var_1 ~ sexo_s1*peso1 + edad_s1 + p17_total + poly(time, 2)*grupo_int_v00,
             random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
           data = dat_longer, subset = !is.na(var_1))

Error in model.frame.default(formula = ~time + var_1 + sexo_s1 + peso1 +  : 
  invalid type (list) for variable 'var_1'

I have tried unlisting/as.data.frame in before running the loop

for (i in 1:length(data)) {
  var_1 <- data[,i]
  var_1 <- unlist(var_1) #or as.data.frame(var_1)
    lme_1 <- lme(var_1 ~ sexo_s1*peso1 + edad_s1 + p17_total + poly(time, 2)*grupo_int_v00,
             random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
           data = dat_longer, subset = !is.na(var_1))
}
Error in model.frame.default(formula = ~time + var_1 + sexo_s1 + peso1 +  : 
  variable lengths differ (found for 'var_1')

I have also tried to develop a new function to iterate over

lme_z <- function(z){
out <-  lme(z ~ sexo_s1*peso1 + edad_s1 + p17_total + poly(time, 2)*grupo_int_v00,
             random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
           data = dat_longer, subset = !is.na(z))
  
}
Error

If there is some contribution to iterate in the response variable (I know Ben Bolker is an expert) Thanks in advance

Upvotes: 0

Views: 161

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226247

I was going to suggest:

formvars <- c("sexo_s1*peso1",
              "edad_s1",
              "p17_total",
              "poly(time, 2)")
## excluded *grupo_int_v00 since not in example data frame
respvars <- names(df)[7:30]
result <- list()
for (r in respvars) {
  result[[r]] <- lme(reformulate(formvars, response = r),
               random = ~ poly(time, 2)|paciente,
               control=lmeControl(opt="optim"),
               data = df,
               na.action = na.exclude)
}

Many of @MikaelJagan's points are well taken. In particular:

  • grupo_int_v00 excluded since it wasn't in your example data set
  • this code doesn't work for your example since there are only two complete cases (i.e., observations with no missing predictors/responses) in the data set, so we can't fit a quadratic polynomial ("degree must be less than the number of unique points")
  • I used na.exclude, which obviates your subset argument; it excludes NA values when fitting but will re-introduce them e.g. in calculating predictions or residuals

Upvotes: 1

Mikael Jagan
Mikael Jagan

Reputation: 11326

If data is a data frame containing all of the variables that you use in your formula, including all of the responses that you want to consider, then you can do:

f <- function(resp) {
    fixed <- . ~ sexo_s1 * peso1 + edad_s1 + p17_total + poly(time, 2) * grupo_int_v00
    fixed[[2L]] <- as.name(resp)
    lme(fixed = fixed,
        random = ~poly(time, 2) | paciente, 
        data = data,
        subset = !is.na(data[[resp]]),
        control = lmeControl(opt = "optim"))
}

list_of_lme_objects <- lapply(names_of_response_variables, f)

An important piece is:

fixed <- . ~ sexo_s1 * peso1 + edad_s1 + p17_total + poly(time, 2) * grupo_int_v00
fixed[[2L]] <- as.name(resp)

The second statement injects the response named resp into the left hand side of the formula template. A more transparent example:

fixed <- . ~ world
fixed[[2L]] <- as.name("hello")
fixed
## hello ~ world

Another important piece is:

subset = !is.na(data[[resp]])

Here, the right hand side actually evaluates to a logical vector of length equal to the number of rows of data. You might consider passing na.action = na.omit instead of subset, though that will also omit rows where the independent variables have missing values, so the semantics are slightly different.

The variable grupo_int_v00 is missing from your data frame. You'll have to fix that on your end in order to test the code...

Upvotes: 1

Related Questions