Translate R for loop into apply function

Question

I have written a for loop in my code

for(i in 2:nrow(ProductionWellYear2)) {

  if (ProductionWellYear2[i,ncol(ProductionWellYear2)] == 0) {
    ProductionWellYear2[i, ncol(ProductionWellYear2)] = ProductionWellYear2[i-1,ncol(ProductionWellYear2)] +1}


  else {ProductionWellYear2[i,ncol(ProductionWellYear2)] = ProductionWellYear2[i,ncol(ProductionWellYear2)]}


  }

However, this is very time intensive as this dataframe has over 800k rows. How can I make this quicker and avoid the for loop?

Gaffi · Accepted Answer

This should work for you, but without seeing your data I can't verify the results are what you want. That being said, there's really not much different here in the process as originally written, but benchmarking does seem to show it is faster with my example data, but not necessarily "fast".

library(microbenchmark)
# Create fake data
set.seed(1)
ProductionWellYear <- data.frame(A = as.integer(rnorm(2500)),
                                 B = as.integer(rnorm(2500)),
                                 C = as.integer(rnorm(2500))
)

# Copy it to confirm results of both processes are the same
ProductionWellYear2 <- ProductionWellYear


# Slightly modified original version
method1 <- function() {
  cols <- ncol(ProductionWellYear)
  for(i in 2:nrow(ProductionWellYear)) {
    if (ProductionWellYear[i, cols] == 0) {
      ProductionWellYear[i, cols] = ProductionWellYear[i - 1, cols] +1
    }
    else {
      ProductionWellYear[i, cols] = ProductionWellYear[i, cols]
    }
  }
}

# New version
method2 <- function() {
  cols <- ncol(ProductionWellYear2)
  sapply(2:nrow(ProductionWellYear2), function(i) {
    if (ProductionWellYear2[i, cols] == 0) {
      ProductionWellYear2[i, cols] <<- ProductionWellYear2[i - 1, cols] +1
    }
  })
}


# Comparing the outputs
all(ProductionWellYear == ProductionWellYear2)
#[1] TRUE

result <- microbenchmark(method1(), method2())
result
#Unit: milliseconds
#      expr      min       lq     mean   median       uq       max neval
#  method1() 151.78802 167.3932 190.14905 176.2855 197.60406 337.9904   100
#  method2()  45.56065  53.7744  67.55549  59.9299  72.81873 174.1417   100

Translate R for loop into apply function

Answers (2)

Related Questions