jejuba
jejuba

Reputation: 199

how to find discordant cells in two dataframe

I have 2 large data frames with identical row and col names. I would like to identify which "cells" differ. For instance, say I have tab1 and tab2

tab1 <- data.frame(name=c('arthur', 'john', 'david', 'loopy'), grade=c(1, 4, 3, 2), size=c(23, 34, 23, 13))
tab2 <- data.frame(name=c('jean', 'john', 'david', 'loopy'), grade=c(1, 4, 5, 2), size=c(23, 34, 23, 16))

I would like the function to report [1,1], [3,2], and [4,3] as discordant.

There are numbers, factors, and character values in the cells. No dates.

Any idea?

Upvotes: 0

Views: 84

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

Make sure your "name" columns are characters and not factors, then you can simply use == to check for equality, or != to check for inequality:

> tab1$name <- as.character(tab1$name)
> tab2$name <- as.character(tab2$name)
> tab1 == tab2
      name grade  size
[1,] FALSE  TRUE  TRUE
[2,]  TRUE  TRUE  TRUE
[3,]  TRUE FALSE  TRUE
[4,]  TRUE  TRUE FALSE

To get the positions, use which(..., arr.ind = TRUE).

> which(tab1 != tab2, arr.ind = TRUE)
     row col
[1,]   1   1
[2,]   3   2
[3,]   4   3

Upvotes: 4

Related Questions