Reputation: 1875
I have some fake data:
library(tidyverse)
df <- data.frame(id = 1:20,
var1 = sample(c(0,1), size = 20, replace = T),
var2 = round(runif(20, min = 0, max = 100),0),
var3 = round(runif(20, min = 0, max = 100),0),
var4 = round(rnorm(20, mean = 50, sd = 20)),
var5 = sample(c(1:19, NA), size=20))
Then, I would like to do some checks on these data:. The IDs of the rows that have errors and an error message should be put in a data.frame errors
. I would like to call the function using the pipe-operator %>%
### Different checks
# There should be no missing values in var5
df %>% filter(is.na(var5)) %>% add_errors("There are NAs in var5")
# var3 should be greater than var4
df %>% filter(var3 < var4) %>% add_errors("var3 is smaller than var4")
# ... etc.
Then I have to define the function add_errors()
:
### Define function
errors <- data.frame(id = numeric(), errormessage = character())
add_errors <- function(dat, error){
errors <<- add_case(errors, id = dat[['id']], errormessage = error)
}
Upvotes: 1
Views: 202
Reputation: 18581
I know that this question is about creating a custom function to check for errors. But there is a nice package called {pointblank} which is exactly made for this kind of task.
Instead of setting up a data.frame
called error
, we can set up an so called "agent" and "interrogate" it to get a nice report. There are several alternative workflows to check for errors which are described on the package's website. Below is one possible way to use the package on your problem.
library(dplyr)
library(pointblank)
df <- data.frame(id = 1:20,
var1 = sample(c(0,1), size = 20, replace = T),
var2 = round(runif(20, min = 0, max = 100),0),
var3 = round(runif(20, min = 0, max = 100),0),
var4 = round(rnorm(20, mean = 50, sd = 20)),
var5 = sample(c(1:19, NA), size=20))
agent <- df %>%
create_agent(
label = "My error checks",
actions = action_levels(stop_at = 1)
) %>%
col_vals_not_null(var5) %>%
col_vals_not_in_set(
vars(var3_lt_4),
preconditions = ~ . %>% dplyr::mutate(var3_lt_4 = var3 > var4),
set = FALSE) %>%
interrogate()
agent
Upvotes: 1
Reputation: 887941
We could either print the error message on the console
add_errors <- function(dat, error) {
glue::glue("{error} at id: {toString(dat[['id']])}")
}
-testing
df %>%
filter(is.na(var5)) %>%
add_errors("There are NAs in var5")
#There are NAs in var5 at id: 6
df %>%
filter(var3 < var4) %>%
add_errors("var3 is smaller than var4")
#var3 is smaller than var4 at id: 1, 2, 3, 4, 6, 7, 8, 11, 15, 16, 17, 20
Or return a tibble/data.frame with error message as output
add_errors <- function(dat, error) {
tibble(id = dat[['id']], errormessage = error)
}
df %>%
filter(is.na(var5)) %>%
add_errors("There are NAs in var5")
# A tibble: 1 x 2
# id errormessage
# <int> <chr>
#1 6 There are NAs in var5
An option is to make use of logger
which would make it more flexible to add error, warning, info etc. along with the timestamp
#remotes::install_github('daroczig/logger')
library(logger)
log_layout(layout_glue_colors)
t <- tempfile()
log_appender(appender_file(t))
log_info('Script starting up...')
df %>%
filter(is.na(var5)) %>%
{log_error('There are NAs in var5')}
df %>%
filter(var3 < var4) %>%
{log_error("var3 is smaller than var4")}
cat(readLines(t), sep="\n")
#INFO [2021-02-28 14:28:42] Script starting up...
#ERROR [2021-02-28 14:28:42] There are NAs in var5
#ERROR [2021-02-28 14:28:43] var3 is smaller than var4
unlink(t)
The t
is a temporary file, which can also be written into a custom destination folder
Upvotes: 1
Reputation: 3335
The following code does something similar to what you are asking. I tried doing it without passing the errors data frame as an argument, but it doesn't end up changing the errors variable outside of the function.
errors=data.frame(id=numeric(), errormessage=character())
add_errors=function(df, errormessage) {
return(bind_rows(errors, data.frame(id=df$id, errormessage=errormessage)))
}
errors=df %>% filter(is.na(var5)) %>% add_errors("There are NAs in var5")
errors=df %>% filter(var3 > var4) %>% add_errors("var3 is smaller than var4")
Output:
> print(errors)
id errormessage
1 3 There are NAs in var5
2 2 var3 is smaller than var4
3 3 var3 is smaller than var4
4 7 var3 is smaller than var4
5 8 var3 is smaller than var4
6 9 var3 is smaller than var4
7 12 var3 is smaller than var4
8 16 var3 is smaller than var4
9 18 var3 is smaller than var4
Upvotes: 1