PMaier
PMaier

Reputation: 654

Identify difference in 2 data frame with missing values

Suppose I have 2 data frames:

a1 <- data.frame(a = 1:5, b=2:6)
a2 <- data.frame(a = 1:5, b=c(2:5,NA))

I would like to identify which columns are not identical (I will need the column number later). I thought that this would do the trick:

apply(!a1==a2, 2, sum, na.rm=TRUE)

However, because the last entry in a2 is an NA, it doesn't work.

Upvotes: 0

Views: 55

Answers (2)

akrun
akrun

Reputation: 886938

If you wanted to use sum, you could try

  colSums(a1==a2, na.rm=TRUE)!=nrow(a1)
  #  a     b 
  #FALSE  TRUE 

Or using your code

 apply(a1==a2, 2, sum, na.rm=TRUE)!=nrow(a1)
 #  a     b 
 #FALSE  TRUE 

Upvotes: 0

Rich Scriven
Rich Scriven

Reputation: 99321

Not sure why you're using sum, but to identify which columns are not identical you could use mapply with identical and negate the result.

which(!mapply(identical, a1, a2))
# b 
# 2 

for the column number. Or more simply for use in a column subset

!mapply(identical, a1, a2)
#     a     b 
# FALSE  TRUE 

Just as a note, the word identical has a meaning in R that may be different from the result of ==, so it's possible you may need to clarify your question a bit.

x <- 1
y <- 1L
x == y
# [1] TRUE
identical(x, y)
# [1] FALSE

Upvotes: 2

Related Questions