Reputation: 1706
I have a dataframe with million of rows and ten columns. My code seems to work but never finish cause of the for loop and if statement I think. I want to write it differently but I'm stuck.
df <- data.frame(x = 1:5,
y = c("a", "a", "b", "b", "c"),
z = sample(5))
for (i in seq_along(df$x)){
if (df$y[i] == df$y[i+1] & df$y[i] == "a"){
df$status[i] <- 1
} else {
df$status[i] <- "ok"
}
}
Upvotes: 1
Views: 84
Reputation: 545588
In fact, you can replace the whole loop by a vectorised ifelse
:
df$status = ifelse(df$y == df$y[-1] & df$y == 'a', 1, 'ok')
This code will give you a warning, unlike the for
loop. However, the warning is actually correct and also concerns your code: you are reading past the last element of df$y
when doing df$y[i + 1]
.
You can make this warning go away (and make the code arguably clearer) by borrowing the lead
function from dplyr
(simplified):
lead = function (x, n = 1, default = NA) {
if (n == 0)
return(x)
`attributes<-`(c(x[-seq_len(n)], rep(default, n)), attributes(x))
}
With this, you can rewrite the code ever so slightly and get rid of the warning:
df$status = ifelse(df$y == lead(df$y) & df$y == 'a', 1, 'ok')
It’s a shame that this function doesn’t seem to exist in base R.
Upvotes: 2