Michael
Michael

Reputation: 1381

Print number of rows filtered out by dplyr's filter function

Is there a way to print the number of rows every filter action filters from a dataframe using dplyr's filter function?

Consider a simple example dataframe which is filtered:

test.df <- data.frame(col1 = c(1,2,3,4,4,5,5,5))

filtered.df <- test.df %>% filter(col1 != 4, col1 != 5)

I would like this piece of code to output:

What I've tried so far in creating my own function

print_filtered_rows <- function(dataframe, ...) {
        dataframe_new <- dataframe
        for(arg in list(...)) {
            print(arg)
            dataframe <- dataframe_new
            dataframe_new <- dataframe %>% filter(arg)
            rows_filtered <- nrow(dataframe) - nrow(data_fram_new)
            print(sprintf('Filtered out %s rows using: %s', rows_filtered, arg)
        }
    return(dataframe_new)
}

But I can't really get a grip on what ... actually is and how to use it. I've read:

http://adv-r.had.co.nz/Functions.html#function-arguments

But this hasn't really helped me.

Upvotes: 4

Views: 1757

Answers (2)

Kene David Nwosu
Kene David Nwosu

Reputation: 898

Adding !! before arg in the filter function seems to fix this Michael's nice function as at dplyr 1.0.0.

print_filtered_rows <- function(dataframe, ...) {
  df <- dataframe
  vars = as.list(substitute(list(...)))[-1L]
  for(arg in vars) {
    dataframe <- df
    dataframe_new <- dataframe %>% filter(!!arg)
    rows_filtered <- nrow(df) - nrow(dataframe_new)
    cat(sprintf('Filtered out %s rows using: %s\n', rows_filtered, deparse(arg)))
    df = dataframe_new
  }
  return(dataframe_new)
}

Upvotes: 0

Michael Griffiths
Michael Griffiths

Reputation: 1427

Very close! You're actually looking for the chapter on Non-Standard Evaluation.

library(dplyr)

print_filtered_rows <- function(dataframe, ...) {
  df <- dataframe
  vars = as.list(substitute(list(...)))[-1L]
  for(arg in vars) {
    dataframe <- df
    dataframe_new <- dataframe %>% filter(arg)
    rows_filtered <- nrow(df) - nrow(dataframe_new)
    cat(sprintf('Filtered out %s rows using: %s\n', rows_filtered, deparse(arg)))
    df = dataframe_new
  }
  return(dataframe_new)
}

data(iris)

iris %>% 
  print_filtered_rows(Species == "virginica", Species != "virginica") %>% 
  head()
#> Filtered out 100 rows using: Species == "virginica"
#> Filtered out 50 rows using: Species != "virginica"
#> [1] Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species     
#> <0 rows> (or 0-length row.names)

Upvotes: 4

Related Questions