Reputation: 311
I have been trying to replace a for loop in my code with an apply function, and i attempted to do it in all the possible ways, using sapply and lapply and apply and mapply, always seems to not work out, the original function looks like this
ds1 <- data.frame(col1 = c(NA, 2), col2 = c("A", "B"))
ds2 <- data.frame(colA = c("A", "B"), colB = c(90, 110))
for(i in 1:nrow(ds1)){
if(is.na(ds1$col1[i])){
ds1$col1[i] <- ds2[ds2[,"colA"] == ds1$col2[i], "colB"]
}
}
My latest attempt with the apply family looks like this
ds1 <- data.frame(col1 = c(NA, 2), col2 = c("A", "B"))
ds2 <- data.frame(colA = c("A", "B"), colB = c(90, 110))
sFunc <- function(x, y, z){
if(is.na(x)){
return(z[z[,"colA"] == y, "colB"])
} else {
return(x)
}
}
ds1$col1 <- sapply(ds1$col1, sFunc, ds1$col2, ds2)
Which returns ds2$colB
for each row, can someone explain to me what I got wrong about this?
Upvotes: 1
Views: 4747
Reputation: 174586
sapply
only iterates over the first vector you pass. The other arguments you pass will be treated as whole vectors in each loop. To iterate over multiple vectors you need multivariate apply, which is mapply.
sFunc <- function(x, y){
if(is.na(x)){
return(ds2[ds2[,"colA"] == y, "colB"])
} else {
return(x)
}
}
mapply(sFunc, ds1$col1, ds1$col2)
#> [1] 90 2
Upvotes: 3
Reputation: 389325
A join would be useful here. You can do it in base R :
transform(merge(ds1, ds2, by.x = "col2", by.y = "colA"),
col1 = ifelse(is.na(col1), colB, col1))[names(ds1)]
# col1 col2
#1 90 A
#2 2 B
Or with dplyr
library(dplyr)
inner_join(ds1, ds2, by = c("col2" = "colA")) %>%
mutate(col1 = coalesce(col1, colB)) %>%
select(names(ds1))
Upvotes: 2