Reputation: 1087
Relatively new to R, I discovered today that 1 == NA
returns NA
instead of an expected FALSE
.
I looked extensively at this question for help, which is good, but is oriented towards vectors and I'm working with dataframes.
Here is a simplified example of what I'm working with:
library(tidyverse)
df_example <- expand_grid(shpmt = c(1:3), stoptype = c("P", "D"))
df_example$metgoal.ref <- c(1,NA,0,0,1,NA)
df_example$metgoal.tri <- c(1,NA,0,1,NA,1)
> df_example
# A tibble: 6 x 4
shpmt stoptype metgoal.ref metgoal.tri
<int> <chr> <dbl> <dbl>
1 1 P 1 1
2 1 D NA NA
3 2 P 0 0
4 2 D 0 1
5 3 P 1 NA
6 3 D NA 1
My goal is to see every instance where .ref and .tri are not the same, including NA
. I first tried a straightforward (I thought) inequality:
> filter(df_example, metgoal.ref != metgoal.tri) #Returns only inequalities without NAs.
# A tibble: 1 x 4
shpmt stoptype metgoal.ref metgoal.tri
<int> <chr> <dbl> <dbl>
1 2 D 0 1
Initially, I didn't realize I was missing the NA
's, but I now know I can get to them using is.na()
, which is important because this construction allows me to find NA
in either of the two columns (and this is why that question doesn't help me very much with its vectors). Somewhat of a downside is that it also gives instances where both columns are NA
(and I care primarily about them being different):
> filter(df_example, is.na(metgoal.ref != metgoal.tri))
# A tibble: 3 x 4
shpmt stoptype metgoal.ref metgoal.tri
<int> <chr> <dbl> <dbl>
1 1 D NA NA #Not ideal -- I want only columns that disagree.
2 3 P 1 NA
3 3 D NA 1
If I put the two constructions together, I can get what I want, except for the dual NA
columns in row 1:
> filter(df_example, is.na(metgoal.ref != metgoal.tri) | (metgoal.ref != metgoal.tri))
# A tibble: 4 x 4
shpmt stoptype metgoal.ref metgoal.tri
<int> <chr> <dbl> <dbl>
1 1 D NA NA #Still not ideal
2 2 D 0 1
3 3 P 1 NA
4 3 D NA 1
But that is a lot to type and maintain for what I consider to be just one inequality, and I have many of these to do for other column sets, plus I have additional conditions to add:
> filter(df_example, (is.na(metgoal.ref != metgoal.tri) | (metgoal.ref != metgoal.tri)) & stoptype == "D")
#Added another condition, increasing complexity.
# A tibble: 3 x 4
shpmt stoptype metgoal.ref metgoal.tri
<int> <chr> <dbl> <dbl>
1 1 D NA NA
2 2 D 0 1
3 3 D NA 1
I thought that perhaps the identical()
function would be helpful, but if it can be, then I'm using it wrong and need help:
> filter(df_example, !identical(df_example$metgoal.ref, df_example$metgoal.tri))
#This does not work at all -- probably using it wrong.
# A tibble: 6 x 4
shpmt stoptype metgoal.ref metgoal.tri
<int> <chr> <dbl> <dbl>
1 1 P 1 1
2 1 D NA NA
3 2 P 0 0
4 2 D 0 1
5 3 P 1 NA
6 3 D NA 1
Other strategies I have seen for this situation is to replace the NA
's with something that can tested for inequality in the manner I originally tried:
df_example2 <- df_example %>%
replace_na(list(metgoal.ref = 9, metgoal.tri = 9)) #Arbitrarily choosing 9 as replacement value
filter(df_example2, metgoal.ref != metgoal.tri)
# A tibble: 3 x 4
shpmt stoptype metgoal.ref metgoal.tri
<int> <chr> <dbl> <dbl>
1 2 D 0 1
2 3 P 1 9
3 3 D 9 1
I suspect that this last solution is the best I can possibly get, but down the line I expect to be doing aggregations and summary statistics on the metgoal.*
columns, and it may be proper that they should remain NA
. I can go back to the df_example
before I converted it, but it sticks in my mind that there's a better solution to this, and it would improve my learning.
Thank you in advance for any suggestions offered.
Upvotes: 0
Views: 552
Reputation: 4140
This seems to work. Please let me know what you think!
mapply(identical, df_example$metgoal.ref, df_example$metgoal.tri)
Then you can use it to filter your original data. I added !
because you're interested in rows where these fields are NOT identical.
In tidy it might look like this
filter(df_example, !mapply(identical, df_example$metgoal.ref, df_example$metgoal.tri))
or in base
df_example[!mapply(identical, df_example$metgoal.ref, df_example$metgoal.tri),]
Upvotes: 1