nicshah
nicshah

Reputation: 345

Find cells where two data frames differ

How to find the cells that are different in two data frames?

df1 =structure(list(Name = c(10746359L, 11034174L, 10279660L, 10127534L, 
10956764L, 10172699L, 10689723L, 10966980L, 10497750L, 10833372L, 
10077477L), Green = c(98L, 86L, 15L, 29L, 77L, 87L, 83L, 79L, 
75L, 46L, 40L), Blue = c(23L, 82L, 48L, 19L, 13L, 41L, 70L, 78L, 
100L, 76L, 75L), Red = c(78L, 55L, 14L, 100L, 59L, 40L, 67L, 
70L, 19L, 39L, 83L), Orange = c(1L, 75L, 17L, 14L, 74L, 53L, 
53L, 78L, 60L, 27L, 86L), Yellow = c("Berlin", "London", "Frankfurt", 
"Beijing", "New York", "Chicago", "Auckland", "Sydney", "Paris", 
"Barcelona", "Madrid"), Violet = c(0.558015352, 0.997666691, 
0.035279025, 0.921518397, 0.172728814, 0.772205286, 0.390398637, 
0.362153606, 0.650357655, 0.606278069, 0.442747248)), class = "data.frame", row.names = c(NA, 
-11L))

df2 = structure(list(Name = c(10746359L, 11034174L, 10279660L, 10127534L, 
10956764L, 10172699L, 10689723L, 10966980L, 10497750L, 10833372L, 
10077477L), Green = c(98L, 86L, 15L, 29L, 77L, 87L, 83L, 79L, 
75L, 46L, 40L), Blue = c(23L, 82L, 48L, 19L, 13L, 41L, 70L, 42L, 
100L, 76L, 75L), Red = c(78L, 55L, 14L, 100L, 59L, 40L, 67L, 
70L, 19L, 39L, 83L), Orange = c(1L, 75L, 17L, 14L, 74L, 53L, 
53L, 78L, 60L, 27L, 86L), Yellow = c("Berlin", "Melbourne", "Frankfurt", 
"Beijing", "New York", "Chicago", "Auckland", "Sydney", "Paris", 
"Barcelona", "Madrid"), Violet = c(0.558015352, 0.997666691, 
0.035279025, 0.921518397, 0.172728814, 0.772205286, 0.390398637, 
0.362153606, 0.650357655, 0.606278069, 0.442747248)), class = "data.frame", row.names = c(NA, 
-11L))

Cells that are different in the two data frames are:

I tried setdiff, but it shows me an entire row.

Upvotes: 1

Views: 39

Answers (1)

Zheyuan Li
Zheyuan Li

Reputation: 73265

We can use

ij <- which(df1 != df2, arr.ind = TRUE)
#     row col
#[1,]   8   3
#[2,]   2   6

If you prefer to column names, then

data.frame(row = ij[, 1], col = names(df1)[ij[, 2]])
#  row    col
#1   8   Blue
#2   2 Yellow

Of course, before doing != you'd better ensure that

  • identical(dim(df1), dim(df2)) is TRUE;

  • identical(names(df1), names(df2)) is TRUE;

  • identical(sapply(df1, class), sapply(df2, class)) is TRUE.

Upvotes: 2

Related Questions