Reputation: 154
I am creating some pre-load files that need to be cleaned, by ensuring the sum of the 2 columns are equal to the total sum column. The data entry was done manually by RA's and therefore the data is prone to error. My problem is ascertaining that the data is clean and if there is an error, the easiest way to identify the columns that don't add up by returning the ID number. This is my data
df1 <- data.frame(
id = c(1,2,3,4,5,6,7),
male = c(2,4,2,6,3,4,5),
female = c(3,6,4,9,2,4,1),
Total = c(5,10,7,15,6,8,7)
)
The code am looking for is suppossed to compare if male+female=Total in each row, and ONLY returns an error where there is disagreement. In my data above, i would expect an error like like sum of male and female in 3 rows with ID 3,5 and 7, are not equal to the total.
Upvotes: 0
Views: 41
Reputation: 143
You could also do something more fancy like this one liner:
df1$id[apply(df1[c('male','female')], 1, sum) != df1$Total]
which will give you just the ids (Aziz's answer works great too)
Upvotes: 1
Reputation: 20705
You can use:
mismatch_rows = which(df1$male + df1$female != df1$Total)
To get the indices of the rows that don't match. If you want the actual values, you can simply use:
df1[mismatch_rows,]
Upvotes: 0