Comparing data frame rows containing NAs

Question

I have a data frame with two column:

x <- c(1, 2, 3, 4, NA, 5, 6)
y <- c(1, 2, 4, 5, 0, 5, 6)

my.df <- data.frame(x, y)

I want to keep only the rows where x != y.

What I did is this:

my.df <- subset(my.df, x != y)

What I expected was:

x  y
3  4
4  5
NA 0

What I got was

x  y
3  4
4  5

This is because, by a strange convention, NA != 0 is NA.

I really want to keep the NA in the subset because I'm looking for the differences between the columns.

How to achieve this?

akrun · Accepted Answer

One option would be to create an | condition to get those rows having NA for 'x'

subset(my.df, x != y | is.na(x))

If there are also NA elements in 'y'

subset(my.df, x != y | is.na(x)|is.na(y))

Not clear about the situation where both 'x' and 'y' are NA. If that needs to be taken out as they are same

subset(my.df, (x != y | is.na(x)|is.na(y)) & !(is.na(x) & is.na(y)))

Answers (2)