find equal rows between data frames, including NA as a value

Question

I have two data frames:

df = structure(list(x = c(NA, NA, "b", "b", "b"), y = c("f", "f", 
"f", "g", "g")), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

df2 = structure(list(x = c(NA, NA, "a", "b", "b"), y = c("g", "f", 
"f", "g", "g")), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

I would like to find the identical rows, when considering NA as a value.

df == df2

According to this, the second rows should be "TRUE". Instead we get NA. Although the logic for this is clear, can we modify df == df2 so that these rows would be considered equal?

akrun · Accepted Answer

One option would be to replace the NA with a value not in the datasets, do the comparison, and check if all the rows are equal with rowSums

rowSums(replace(df2, is.na(df2), "0") == replace(df, is.na(df), "0"))== 2
#[1] FALSE  TRUE FALSE  TRUE  TRUE

Or without replacing, create a logical condition with is.na

rowSums((!is.na(df) & df== df2)|(is.na(df))) == ncol(df)

find equal rows between data frames, including NA as a value

Answers (2)

Related Questions