Reputation: 14958
I am trying to iterate a function progressively through the columns of a database. What are the options available for this? Am I limited to a for loop or is there a dplyr approach or similarly intuitive code structure I can use?
Essentially I have a matrix much much larger than the one constructed below, but of the same general structure. The first column explains which version to select as the refined datacollection, and then all the data that comprises each of these three versions.
library(dplyr)
# Function: creates a matrix of random strings, v = versionNumber
matADv.maker <- function (v){
matADv <- data.frame(matrix(sample(letters[1:26], 10), nrow = 5))
colnames(matADv) <- paste0("v", v, "_", letters[24:25])
return(matADv)
}
set.seed(1)
lvl <- data.frame(c(as.integer(runif(5, 5, 8))))
colnames(lvl) <- "Level"
matADv5 <- matADv.maker(5)
matADv6 <- matADv.maker(6)
matADv7 <- matADv.maker(7)
matComp <- bind_cols(lvl, matADv5, matADv6, matADv7)
matComp
Source: local data frame [5 x 8]
rowname Level v5_x v5_y v6_x v6_y v7_x v7_y (chr) (int) (chr) (chr) (chr) (chr) (chr) (chr) 1 1 5 x e m t k z 2 2 6 z d r e a l 3 3 6 p n x z j x 4 4 7 o g i c u d 5 5 5 b s y u h o
I was wondering if there isn't instead some simple way to shuttle a function along the dataframe.
Essentially, How might I be able to iterate through columns by using dplyr or similar simpler constructions than for loops?
Upvotes: 3
Views: 2818
Reputation: 14958
Well, I now realize Hadley had already prepared the way for the best solution of all; I need to process the column transformations in bulk. I collect all the columns with the same issue (filter), resolve the issues, and then join all the frames back together in a Split, Apply, Combine approach.
More efficient too: tackling the problem as 3 tables instead of hundreds of columnar iterations testing each data point and then reacting consecutively.
library(magrittr)
matComp %<>% add_rownames
v5Mat <- matComp %>% filter(Level == 5) %>% select(rowname, starts_with("v5"))
v6Mat <- matComp %>% filter(Level == 6) %>% select(rowname, starts_with("v6"))
v7Mat <- matComp %>% filter(Level == 7) %>% select(rowname, starts_with("v7"))
colnames(v5Mat) %<>% gsub("v\\d_", "ref", .)
colnames(v6Mat) %<>% gsub("v\\d_", "ref", .)
colnames(v7Mat) %<>% gsub("v\\d_", "ref", .)
refinedMat <-
Reduce(function(...) merge(..., all=TRUE), list(matComp, v5Mat, v6Mat, v7Mat)) %>%
group_by(rowname) %>%
summarise_each(funs(na.omit))
refinedMat
results in:
Source: local data frame [5 x 10]
rowname refx refy Level v5_x v5_y v6_x v6_y v7_x v7_y
(chr) (chr) (chr) (int) (chr) (chr) (chr) (chr) (chr) (chr)
1 1 x e 5 x e m t k z
2 2 r e 6 z d r e a l
3 3 x z 6 p n x z j x
4 4 u d 7 o g i c u d
5 5 b s 5 b s y u h o
Upvotes: 0
Reputation: 93811
You can avoid ifelse
and looping by taking advantage of the regularity of the column names. Here's a base R solution:
For each row of matComp
, you want to find the column whose column name contains the correct value of Level
for a given "suffix" (x or y) and assign that value to a new refdat
column. You can do this with the match
function.
In the code below, we use apply
to go through each row of matComp
. In each row, we use match
to find the index of the desired column so that we can return the correct value for refdat
. sapply
iterates over the two suffixes, and cbind
adds the two new columns to matcomp
.
matComp = cbind(matComp,
sapply(paste0("refdat_",c("x","y")), function(var) {
suffix = substr(var, nchar(var), nchar(var))
apply(matComp, 1, function(vec) {
vec[match(paste0("v", vec["Level"], "_", suffix), names(vec))]
})
}), stringsAsFactors=FALSE)
Level v5_x v5_y v6_x v6_y v7_x v7_y refdat_x refdat_y 1 5 x e m t k z x e 2 6 z d r e a l r e 3 6 p n x z j x x z 4 7 o g i c u d u d 5 5 b s y u h o b s
Upvotes: 1