Identify difference in 2 data frame with missing values

Question

Suppose I have 2 data frames:

a1 <- data.frame(a = 1:5, b=2:6)
a2 <- data.frame(a = 1:5, b=c(2:5,NA))

I would like to identify which columns are not identical (I will need the column number later). I thought that this would do the trick:

apply(!a1==a2, 2, sum, na.rm=TRUE)

However, because the last entry in a2 is an NA, it doesn't work.

Rich Scriven · Accepted Answer

Not sure why you're using sum, but to identify which columns are not identical you could use mapply with identical and negate the result.

which(!mapply(identical, a1, a2))
# b 
# 2

for the column number. Or more simply for use in a column subset

!mapply(identical, a1, a2)
#     a     b 
# FALSE  TRUE

Just as a note, the word identical has a meaning in R that may be different from the result of ==, so it's possible you may need to clarify your question a bit.

x <- 1
y <- 1L
x == y
# [1] TRUE
identical(x, y)
# [1] FALSE

Identify difference in 2 data frame with missing values

Answers (2)

Related Questions