Reputation: 6755

R: Logical operations on two columns accounting for presence of NA in both

I have this data frame with two columns which can either take the value of left or right.

test_df <- data.frame(col1 = c("right","left","right",NA),
                      col2 = c("left","right",NA,"right"))

test_df

#    col1  col2
# 1 right  left
# 2  left right
# 3 right  <NA>
# 4  <NA> right

Now I want to test this multiple condition

test_df$col1 == "left" | test_df$col2 == "right"

# [1] FALSE  TRUE    NA  TRUE

The first three results are as expected, but why the last result is TRUE instead of NA. What's different between results for row 3 and row 4?

Upvotes: 1

Answers (1)

RHertel

Reputation: 23798

In your code you are testing whether at least one of the following conditions is fulfilled; "left" in col1 or "right" in col2. In row 4 you have "right" in col2, therefore the result is TRUE, irrespective of what may or may not be in col1. The situation is different in row 3. There, col1 does not contain "left", hence it remains to be seen if col2 contains "right" in order to conclude whether the statement is FALSE or TRUE. However, since the entry in col2 for row 3 is NA, the result of the comparison cannot be decided and, accordingly, the output is NA.

If you want to have a function that performs the comparison between the entries in col1 and col2 that you mentioned but returns NA if any of the entries in those two columns is NA, you could use

as.logical((test_df$col1 == "left") + (test_df$col2 == "right"))
#[1] FALSE  TRUE    NA    NA

In this line of code, the results of the individual comparisons, yielding TRUE or FALSE, are coerced into numerical values by the + operator. If any part of the sum is NA, the sum will be NA. This addition is done for each row of the dataframe, so the result is a vector with the length nrow(test_df).

By using as.logical(), the result of the sum calculated in the brackets is converted back into logical values. Again, this is done for each element of the vector. If the sum is zero, then the result is FALSE, if it is NA it will remain NA. Any non-zero integer will be converted into TRUE.

Upvotes: 1

R: Logical operations on two columns accounting for presence of NA in both

Answers (1)

Related Questions