Reputation: 15
I am attempting to utilize the R package Validate for our data QC process (we enter our data in the field, and small mistakes happen - as well as there being lots of old data that is quite messy). I have figured out how to write out rules using validator(), see how many items passed or failed using confront(), and see which entries violate the rules using violating(). What I need is a way to tell which rules are violated where.
# an example dataframe
plot <- 1:4
DBH <- c(1.1, 2, 0.7, 3.2)
df <- data.frame(plot, DBH)
# creating a rule - DBH must be greater than or equal to 1
rule <- validator(DBH >= 1)
# confronting the dataset with the rule
x <- confront(df, rule)
summary(x)
# show which rows of the dataframe do not comply with the rule
y <- violating(df, rule)
y
confront() tells me which rules were violated how many times, and violating() tells me which dataframe rows violated the rules. How do I merge the two so I can tell which rules were violated where?
Is there a way to add a column to the output of the violating() function that will tell the user which rule(s) was violated? I know I can add names and descriptions to my rules, but I need a way to merge that into the output of violating().
My actual dataset has 50+ variables and many more rules, so multiple rules could be violated for a single row of the dataframe.
Upvotes: 1
Views: 50