Reputation: 31761
I have a data frame with a response variable, Y, and three factors, 'factor.a', 'factor.b', and 'factor.c'
I am trying to write a function that will
remove a columns from a data frame if all levels of the factor are the same
add the terms 'beta.factor.x[1..n]' to a vector of parameters when there is more than one level of a factor, up to 5 levels.
exclude the parameter beta.factor.b[1] from in the list (it is fixed)
Here is my code. I think it looks nice and works well, but I have read that it is best to avoid nested for loops, so I am curious if there is a more efficient approach.
data <- data.frame( y = c(1,2,3,4),
factor.a = c(1, 1, 2, 1),
factor.b = c(1, 2, 2, 3),
factor.c = c(0, 0, 0, 0))
model.parms <- list(factor.a = length(unique(data$factor.a)),
factor.b = length(unique(data$factor.b)),
factor.c = length(unique(data$factor.c)))
vars <- 'beta.o'
for (x in c('factor.a','factor.c', 'factor.b')) {
if(model.parms[[x]] == 1) {
data <- data[, -which(names(data) == x)]
} else {
m <- min(model.parms[[x]], 5)
for (i in 1:m) {
if(!i == 1 && x == 'factor.b') {
vars <- c(vars, paste('beta.', x, '[', i, ']', sep=''))
}
}
}
}
Upvotes: 1
Views: 894
Reputation: 18487
You don't need any loops at all
vars <- c('beta.o',
paste('sd.', names(model.parms)[model.parms > 1], sep = ''),
paste('beta.factor.b', '[', 1 + seq_len(min(model.parms[["factor.b"]], 5) - 1), ']', sep='')
)
data <- data[, names(model.parms)[model.parms > 1]]
Upvotes: 2
Reputation: 44972
You can often void nested loops with by(). Taking your data frame,
> out <- by(data,data[,-1],identity)
> out
will get you
factor.a: 1
factor.b: 1
factor.c: 0
y factor.a factor.b factor.c
1 1 1 1 0
------------------------------------------------------------
factor.a: 2
factor.b: 1
factor.c: 0
NULL
------------------------------------------------------------
factor.a: 1
factor.b: 2
factor.c: 0
y factor.a factor.b factor.c
2 2 1 2 0
------------------------------------------------------------
factor.a: 2
factor.b: 2
factor.c: 0
y factor.a factor.b factor.c
3 3 2 2 0
------------------------------------------------------------
factor.a: 1
factor.b: 3
factor.c: 0
y factor.a factor.b factor.c
4 4 1 3 0
------------------------------------------------------------
factor.a: 2
factor.b: 3
factor.c: 0
NULL
if you unclass(out)
, you will get a matrix or array of mode list
; each element will contain the rows of the original data frame that is aggregated by the levels specified in the second argument of by()
. Of course, you can replace the identity
function with another function that operates on that subset of the data frame (the output will always be a matrix or array, but not necessarily of mode list
, depending on what you return from your function).
Upvotes: 1