Reputation: 2551
I'm using the validate package in order to validate a dataframe.
I have some rules, which check the datatype and others represent certain contraints, which the data needs to satisfy. My Problem however is, that checking for the datatype is done array-wise and not record-wise. So when I want to get the rows, which violate the rules using violating
, I get the error massage
"Error in violating(virg, rules) : Not all rules have record-wise output".
I made a small example illustrating the problem:
library(validate)
library(dplyr)
virg <- filter(iris, Species == "virginica")
virg$Sepal.Length[2] <- "hello"
virg$Sepal.Length[3] <- -3
rules <- validator(
Sepal.Length > 0
, is.numeric(Sepal.Length)
)
cf <- confront(virg, rules)
summary(cf)
violating(virg, rules)
I would like to get the rows 2 and 3 as an output, idealy with the information, which rule was violated. Is there an easy way, to force record-wise ouput, when checking for datatypes? how else can I check for violations?
Upvotes: 2
Views: 747
Reputation: 660
I came here with this exact same question. After looking at the paper and blog post at the end of this answer, I came up with two options.
The first is to select the rules that have record-wise output.
The second is to use the values
function and extract the component of the output that corresponds to the row-wise evaluation of rules. In this case, it's the first element of the list, hence values(cf)[[1]]
. Then select any row that fails at least one rule.
library(validate)
library(dplyr)
virg <- filter(iris, Species == "virginica")
virg$Sepal.Length[2] <- "hello"
virg$Sepal.Length[3] <- -3
rules <- validator(
Sepal.Length > 0
, is.numeric(Sepal.Length)
)
cf <- confront(virg, rules)
summary(cf)
# option 1
violating(virg, rules[1])
# option 2
out<-values(cf)[[1]]
ifail <- apply(out, 1, all, na.rm=TRUE)
virg[!ifail,]
Upvotes: 2