Reputation: 65
I am attempting to fit a Poisson regression model to a dataset in R, whereby I have vectors of different lengths stored in two lists as dataframe columns, as so:
test <- data.frame(a = 1:10, b = rnorm(10))
test$c <- list(length = nrow(test))
test$d <- list(length = nrow(test))
for(i in 1:nrow(test)) {
test$c[[i]] <- LETTERS[1:sample(10:11, 1)]
test$d[[i]] <- LETTERS[1:sample(10:11, 1)]
}
I need to build a model to predict a
from b
and the vectors c
and d
. As it is not possible to pass lists to a glm, I tried unlisting c
and d
to feed them into the model, but this just ends up creating one long vector for both c
and d
, meaning I get this error:
m0.glm <- glm(a ~ b + unlist(c) + unlist(d), data = test)
Error in model.frame.default(formula = a ~ b + unlist(c) + unlist(d), :
variable lengths differ (found for 'unlist(c)')
I feel like there will be a simple solution that I am missing to my problem, but I have not had to attempt to pass a list of vectors to a model before.
Thanks in advance.
Upvotes: 0
Views: 305
Reputation: 5456
If the problem is to create a df out of lists, then:
test <- data.frame(a = 1:10, b = rnorm(10))
test$c <- list(length(nrow(test)))
test$d <- list(length(nrow(test)))
for(i in 1:nrow(test)) {
test$c[[i]] <- LETTERS[1:sample(10:11, 1)]
test$d[[i]] <- LETTERS[1:sample(10:11, 1)]
}
#
do.call(rbind, lapply(test$c, function(x) {
res <- rep(NA, max(vapply(test$c, length, integer(1))))
res[1:length(x)] <- x
res
})) -> test_c_df
do.call(rbind, lapply(test$d, function(x) {
res <- rep(NA, max(vapply(test$d, length, integer(1))))
res[1:length(x)] <- x
res
})) -> test_d_df
test_new <- cbind(test[c("a", "b")], test_c_df, test_d_df)
names(test_new) <- make.unique(names(test_new))
m0.glm <- glm(a ~ ., data = test_new) # data reasonable??
Upvotes: 1