Reputation: 1381
Is there a way to print the number of rows every filter action filters from a dataframe using dplyr's filter function?
Consider a simple example dataframe which is filtered:
test.df <- data.frame(col1 = c(1,2,3,4,4,5,5,5))
filtered.df <- test.df %>% filter(col1 != 4, col1 != 5)
I would like this piece of code to output:
What I've tried so far in creating my own function
print_filtered_rows <- function(dataframe, ...) {
dataframe_new <- dataframe
for(arg in list(...)) {
print(arg)
dataframe <- dataframe_new
dataframe_new <- dataframe %>% filter(arg)
rows_filtered <- nrow(dataframe) - nrow(data_fram_new)
print(sprintf('Filtered out %s rows using: %s', rows_filtered, arg)
}
return(dataframe_new)
}
But I can't really get a grip on what ... actually is and how to use it. I've read:
http://adv-r.had.co.nz/Functions.html#function-arguments
But this hasn't really helped me.
Upvotes: 4
Views: 1757
Reputation: 898
Adding !!
before arg
in the filter function seems to fix this Michael's nice function as at dplyr 1.0.0.
print_filtered_rows <- function(dataframe, ...) {
df <- dataframe
vars = as.list(substitute(list(...)))[-1L]
for(arg in vars) {
dataframe <- df
dataframe_new <- dataframe %>% filter(!!arg)
rows_filtered <- nrow(df) - nrow(dataframe_new)
cat(sprintf('Filtered out %s rows using: %s\n', rows_filtered, deparse(arg)))
df = dataframe_new
}
return(dataframe_new)
}
Upvotes: 0
Reputation: 1427
Very close! You're actually looking for the chapter on Non-Standard Evaluation.
library(dplyr)
print_filtered_rows <- function(dataframe, ...) {
df <- dataframe
vars = as.list(substitute(list(...)))[-1L]
for(arg in vars) {
dataframe <- df
dataframe_new <- dataframe %>% filter(arg)
rows_filtered <- nrow(df) - nrow(dataframe_new)
cat(sprintf('Filtered out %s rows using: %s\n', rows_filtered, deparse(arg)))
df = dataframe_new
}
return(dataframe_new)
}
data(iris)
iris %>%
print_filtered_rows(Species == "virginica", Species != "virginica") %>%
head()
#> Filtered out 100 rows using: Species == "virginica"
#> Filtered out 50 rows using: Species != "virginica"
#> [1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <0 rows> (or 0-length row.names)
Upvotes: 4