user5199932
user5199932

Reputation: 77

Dplyr rowwise access entire column

Given the following data

   A B
1  1 2
2  2 2
3  3 3
4  4 4
5  5 4

For each row I'd like to find the index where A exceeds B for the first time. So the required answer is:

  A B NextIndex
1 1 2         3
2 2 2         3
3 3 3         4
4 4 4         5
5 5 5         5

My approach with dplyr is

A_col<-foo$A  
foo  %>%  rowwise() %>% mutate(NextIndex=which(A_col-B>0)[1] )

My actual data.frame is several million rows and the processing time increases dramatically. Note, I reference the full A_col in each row comparison and I tried a version using row_number() but didn't achieve a significant speed improvement. Also, note, A and B are actually POSIXct variables in my data.frame and will be strictly increasing in time but not by periodic amounts.

How would I improve the efficiency of this expression?

Upvotes: 1

Views: 327

Answers (2)

akrun
akrun

Reputation: 887088

We can use vapply

foo$nextIndex <- vapply(foo$B, function(x) which(foo$A-x>0)[1], 1)
foo
#   A B nextIndex
#1 1 2         3
#2 2 2         3
#3 3 3         4
#4 4 4         5
#5 5 4         5

Or another option if the values are in order

findInterval(foo$B, foo$A)+1L
#[1] 3 3 4 5 5

Using it in the dplyr chain

foo %>% 
    mutate(rowIndex = findInterval(B, A)+1L)

Upvotes: 2

Gopala
Gopala

Reputation: 10483

How about this:

df$nextIndex <- apply(df, 1, function(x) which.max(df$A - x[2] > 0))
df
  A B nextIndex
1 1 2         3
2 2 2         3
3 3 3         4
4 4 4         5
5 5 4         5

Upvotes: 0

Related Questions