Sam Firke
Sam Firke

Reputation: 23014

Run an assert() check on a subset of the data in-line without modifying output data.frame

I want to exempt a few rows from a check with assertr::assert() without modifying the data.frame that's passed through.

For instance say I want to assert that there are no duplicated values of mtcars$qsec where mtcars$am is 0. I want to exempt the values where am = 1 and get back all of mtcars.

This fails as it should:

library(assertr)
mtcars %>%
  assert(is_uniq, qsec)

And this works but passes through the filtered data.frame:

mtcars %>% 
  filter(am == 0) %>% 
  assert(is_uniq, qsec)

What I want is this, where it would succeed and pass all the data through if there are no duplicated values of qsec where am == 0, and to throw an error if there are:

mtcars %>% 
  assert(filter(., am == 0), is_uniq, qsec)

But that doesn't work. Is there a way I can check a subset of the data in a pipeline while still getting the whole data set out at the end?

Upvotes: 0

Views: 427

Answers (1)

Mikael Jagan
Mikael Jagan

Reputation: 11326

You can use a lambda expression, as documented in ?magrittr::`%>%`:

mtcars0 <- mtcars %>% { {assert(filter(., am == 1), is_uniq, qsec); .} }
identical(mtcars0, mtcars)
## [1] TRUE

Perhaps a more transparent example:

d <- data.frame(g = rep(1:2, each = 3), x = c(1, 2, 3, rep(4, 3)))
##   g x
## 1 1 1
## 2 1 2
## 3 1 3
## 4 2 4
## 5 2 4
## 6 2 4

d0 <- d %>% { {assert(filter(., g == 1), is_uniq, x); .} }
identical(d0, d)
## [1] TRUE

d %>% { {assert(filter(., g == 2), is_uniq, x); .} }
## Column 'x' violates assertion 'is_uniq' 3 times
##     verb redux_fn predicate column index value
## 1 assert       NA   is_uniq      x     1     4
## 2 assert       NA   is_uniq      x     2     4
## 3 assert       NA   is_uniq      x     3     4
##
## Error: assertr stopped execution

Upvotes: 1

Related Questions