David LeBauer
David LeBauer

Reputation: 31761

can I avoid these nested for loops?

I have a data frame with a response variable, Y, and three factors, 'factor.a', 'factor.b', and 'factor.c'

I am trying to write a function that will

  1. remove a columns from a data frame if all levels of the factor are the same

  2. add the terms 'beta.factor.x[1..n]' to a vector of parameters when there is more than one level of a factor, up to 5 levels.

  3. exclude the parameter beta.factor.b[1] from in the list (it is fixed)

Here is my code. I think it looks nice and works well, but I have read that it is best to avoid nested for loops, so I am curious if there is a more efficient approach.

data <- data.frame(       y = c(1,2,3,4),
                   factor.a = c(1, 1, 2, 1),
                   factor.b = c(1, 2, 2, 3),
                   factor.c = c(0, 0, 0, 0))

model.parms <- list(factor.a  = length(unique(data$factor.a)),
                    factor.b  = length(unique(data$factor.b)),
                    factor.c  = length(unique(data$factor.c)))
vars <- 'beta.o'
for (x in c('factor.a','factor.c', 'factor.b')) {
  if(model.parms[[x]] == 1) {
    data <- data[, -which(names(data) == x)]
  } else {
    m <- min(model.parms[[x]], 5)
    for (i in 1:m) {
      if(!i == 1 && x == 'factor.b') {
        vars <- c(vars, paste('beta.', x, '[', i, ']', sep=''))
      }
    }
  }
}

Upvotes: 1

Views: 894

Answers (2)

Thierry
Thierry

Reputation: 18487

You don't need any loops at all

vars <- c('beta.o',
  paste('sd.', names(model.parms)[model.parms > 1], sep = ''),
  paste('beta.factor.b', '[', 1 +  seq_len(min(model.parms[["factor.b"]], 5) - 1), ']', sep='')
)
data <- data[, names(model.parms)[model.parms > 1]]

Upvotes: 2

hatmatrix
hatmatrix

Reputation: 44972

You can often void nested loops with by(). Taking your data frame,

> out <- by(data,data[,-1],identity)
> out

will get you

factor.a: 1
factor.b: 1
factor.c: 0
  y factor.a factor.b factor.c
1 1        1        1        0
------------------------------------------------------------ 
factor.a: 2
factor.b: 1
factor.c: 0
NULL
------------------------------------------------------------ 
factor.a: 1
factor.b: 2
factor.c: 0
  y factor.a factor.b factor.c
2 2        1        2        0
------------------------------------------------------------ 
factor.a: 2
factor.b: 2
factor.c: 0
  y factor.a factor.b factor.c
3 3        2        2        0
------------------------------------------------------------ 
factor.a: 1
factor.b: 3
factor.c: 0
  y factor.a factor.b factor.c
4 4        1        3        0
------------------------------------------------------------ 
factor.a: 2
factor.b: 3
factor.c: 0
NULL

if you unclass(out), you will get a matrix or array of mode list; each element will contain the rows of the original data frame that is aggregated by the levels specified in the second argument of by(). Of course, you can replace the identity function with another function that operates on that subset of the data frame (the output will always be a matrix or array, but not necessarily of mode list, depending on what you return from your function).

Upvotes: 1

Related Questions