Mostafa90
Mostafa90

Reputation: 1706

Avoid loop to improve r code

I have a dataframe with million of rows and ten columns. My code seems to work but never finish cause of the for loop and if statement I think. I want to write it differently but I'm stuck.

df <- data.frame(x = 1:5,
                 y = c("a", "a", "b", "b", "c"),
                 z = sample(5))

for (i in seq_along(df$x)){
  if (df$y[i] == df$y[i+1] & df$y[i] == "a"){
    df$status[i] <- 1
  } else {
    df$status[i] <- "ok"
  }
}

Upvotes: 1

Views: 84

Answers (1)

Konrad Rudolph
Konrad Rudolph

Reputation: 545588

In fact, you can replace the whole loop by a vectorised ifelse:

df$status = ifelse(df$y == df$y[-1] & df$y == 'a', 1, 'ok')

This code will give you a warning, unlike the for loop. However, the warning is actually correct and also concerns your code: you are reading past the last element of df$y when doing df$y[i + 1].

You can make this warning go away (and make the code arguably clearer) by borrowing the lead function from dplyr (simplified):

lead = function (x, n = 1, default = NA) {
    if (n == 0)
        return(x)

    `attributes<-`(c(x[-seq_len(n)], rep(default, n)), attributes(x))
}

With this, you can rewrite the code ever so slightly and get rid of the warning:

df$status = ifelse(df$y == lead(df$y) & df$y == 'a', 1, 'ok')

It’s a shame that this function doesn’t seem to exist in base R.

Upvotes: 2

Related Questions