user213544
user213544

Reputation: 2126

Filter conditionally in a pipe in R

I have a lot of data frames, each with several columns. Two of these columns are time and value.

Minimal example

library(tidyverse)

df <- approx(seq(1,10,1), c(1,5,7,11,4,12,30, 20, 10, 9)) %>% 
      as.data.frame() %>% 
      rename(time = x, value = y)

Goal

I want to remove all rows from each data frame, starting at the first time value > 10.

When the data frame contains values > 10, a solution would be the following:

df <- df %>% 
         filter(row_number() <= first(which(value > 10))-1)

However, there are also data frames where the value does not exceed 10, e.g.,

df <- approx(seq(1,10,1), c(1,5,7,1,4,2,1, 2, 1, 9)) %>% 
      as.data.frame() %>% 
      rename(time = x, value = y)

In this case, the data frame should not be filtered (because the value threshold is not reached). When I use the filter solution from above, however, it returns an empty data frame.

Question

How would you solve this problem inside a dplyr pipe? Is it possible to do conditional filtering?

Upvotes: 0

Views: 3456

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388807

You could write a conditional statement in filter :

library(dplyr)

df %>% 
    filter(if(any(value > 10)) row_number() <= which.max(value > 10)-1 else TRUE)

Writing the same logic in slice :

df %>% 
   slice(if(any(value > 10)) seq_len(which.max(value > 10)-1) else seq_len(n()))

Microbenchmarking

In terms of speed, there isn't a large difference between filter and slice:

df <- approx(seq(1,10^5,1), 
             round( runif(10^5, min = 1, max = 10^10) ) ) %>% 
      as.data.frame()

library(microbenchmark)

microbenchmark(
  filter = df %>% filter(if(any(value > 10)) row_number() <= which.max(value > 10)-1 else TRUE),
  slice = df %>% slice(if(any(value > 10)) seq_len(which.max(value > 10)-1) else seq_len(n())), times = 10000)

Unit: microseconds
  expr     min       lq     mean   median       uq      max neval
 filter 551.522 570.2715 655.7250 586.3530 621.5590 13575.81 10000
 slice 614.276 633.6840 735.0398 654.2455 695.3795 14123.43 10000

Upvotes: 2

Related Questions