Emily
Emily

Reputation: 15

R validate package - return rules that are violated for each row

I am attempting to utilize the R package Validate for our data QC process (we enter our data in the field, and small mistakes happen - as well as there being lots of old data that is quite messy). I have figured out how to write out rules using validator(), see how many items passed or failed using confront(), and see which entries violate the rules using violating(). What I need is a way to tell which rules are violated where.

# an example dataframe
plot <- 1:4
DBH <- c(1.1, 2, 0.7, 3.2)
df <- data.frame(plot, DBH)

# creating a rule - DBH must be greater than or equal to 1
rule <- validator(DBH >= 1)

# confronting the dataset with the rule
x <- confront(df, rule)
summary(x)

# show which rows of the dataframe do not comply with the rule
y <- violating(df, rule)
y

confront() tells me which rules were violated how many times, and violating() tells me which dataframe rows violated the rules. How do I merge the two so I can tell which rules were violated where?

Is there a way to add a column to the output of the violating() function that will tell the user which rule(s) was violated? I know I can add names and descriptions to my rules, but I need a way to merge that into the output of violating().

My actual dataset has 50+ variables and many more rules, so multiple rules could be violated for a single row of the dataframe.

Upvotes: 1

Views: 50

Answers (0)

Related Questions