Reputation: 77
Given the following data
A B
1 1 2
2 2 2
3 3 3
4 4 4
5 5 4
For each row I'd like to find the index where A exceeds B for the first time. So the required answer is:
A B NextIndex
1 1 2 3
2 2 2 3
3 3 3 4
4 4 4 5
5 5 5 5
My approach with dplyr
is
A_col<-foo$A
foo %>% rowwise() %>% mutate(NextIndex=which(A_col-B>0)[1] )
My actual data.frame is several million rows and the processing time increases dramatically. Note, I reference the full A_col
in each row comparison and I tried a version using row_number()
but didn't achieve a significant speed improvement.
Also, note, A and B are actually POSIXct
variables in my data.frame and will be strictly increasing in time but not by periodic amounts.
How would I improve the efficiency of this expression?
Upvotes: 1
Views: 327
Reputation: 887088
We can use vapply
foo$nextIndex <- vapply(foo$B, function(x) which(foo$A-x>0)[1], 1)
foo
# A B nextIndex
#1 1 2 3
#2 2 2 3
#3 3 3 4
#4 4 4 5
#5 5 4 5
Or another option if the values are in order
findInterval(foo$B, foo$A)+1L
#[1] 3 3 4 5 5
Using it in the dplyr
chain
foo %>%
mutate(rowIndex = findInterval(B, A)+1L)
Upvotes: 2
Reputation: 10483
How about this:
df$nextIndex <- apply(df, 1, function(x) which.max(df$A - x[2] > 0))
df
A B nextIndex
1 1 2 3
2 2 2 3
3 3 3 4
4 4 4 5
5 5 4 5
Upvotes: 0