Jimmy Bower
Jimmy Bower

Reputation: 311

Using apply functions instead of for loops in R

I have been trying to replace a for loop in my code with an apply function, and i attempted to do it in all the possible ways, using sapply and lapply and apply and mapply, always seems to not work out, the original function looks like this

ds1 <- data.frame(col1 = c(NA, 2), col2 = c("A", "B"))
ds2 <- data.frame(colA = c("A", "B"), colB = c(90, 110))

for(i in 1:nrow(ds1)){
  if(is.na(ds1$col1[i])){
    ds1$col1[i] <- ds2[ds2[,"colA"] == ds1$col2[i], "colB"]
  }
}

My latest attempt with the apply family looks like this

ds1 <- data.frame(col1 = c(NA, 2), col2 = c("A", "B"))
ds2 <- data.frame(colA = c("A", "B"), colB = c(90, 110))

sFunc <- function(x, y, z){
  if(is.na(x)){
    return(z[z[,"colA"] == y, "colB"])
  } else {
    return(x)
  }
}

ds1$col1 <- sapply(ds1$col1, sFunc, ds1$col2, ds2)

Which returns ds2$colB for each row, can someone explain to me what I got wrong about this?

Upvotes: 1

Views: 4747

Answers (2)

Allan Cameron
Allan Cameron

Reputation: 174586

sapply only iterates over the first vector you pass. The other arguments you pass will be treated as whole vectors in each loop. To iterate over multiple vectors you need multivariate apply, which is mapply.

sFunc <- function(x, y){
  if(is.na(x)){
    return(ds2[ds2[,"colA"] == y, "colB"])
  } else {
    return(x)
  }
}

mapply(sFunc, ds1$col1, ds1$col2)
#> [1] 90  2

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389325

A join would be useful here. You can do it in base R :

transform(merge(ds1, ds2, by.x = "col2", by.y = "colA"), 
          col1 = ifelse(is.na(col1), colB, col1))[names(ds1)]

#  col1 col2
#1   90    A
#2    2    B

Or with dplyr

library(dplyr)

inner_join(ds1, ds2, by = c("col2" = "colA")) %>%
    mutate(col1 = coalesce(col1, colB)) %>%
    select(names(ds1))

Upvotes: 2

Related Questions