Reputation: 6755
I have this data frame with two columns which can either take the value of left
or right
.
test_df <- data.frame(col1 = c("right","left","right",NA),
col2 = c("left","right",NA,"right"))
test_df
# col1 col2
# 1 right left
# 2 left right
# 3 right <NA>
# 4 <NA> right
Now I want to test this multiple condition
test_df$col1 == "left" | test_df$col2 == "right"
# [1] FALSE TRUE NA TRUE
The first three results are as expected, but why the last result is TRUE
instead of NA
. What's different between results for row 3 and row 4?
Upvotes: 1
Views: 1743
Reputation: 23798
In your code you are testing whether at least one of the following conditions is fulfilled; "left" in col1 or "right" in col2. In row 4 you have "right" in col2, therefore the result is TRUE
, irrespective of what may or may not be in col1. The situation is different in row 3. There, col1 does not contain "left", hence it remains to be seen if col2 contains "right" in order to conclude whether the statement is FALSE
or TRUE
. However, since the entry in col2 for row 3 is NA
, the result of the comparison cannot be decided and, accordingly, the output is NA
.
If you want to have a function that performs the comparison between the entries in col1 and col2 that you mentioned but returns NA
if any of the entries in those two columns is NA
, you could use
as.logical((test_df$col1 == "left") + (test_df$col2 == "right"))
#[1] FALSE TRUE NA NA
In this line of code, the results of the individual comparisons, yielding TRUE
or FALSE
, are coerced into numerical values by the +
operator. If any part of the sum is NA
, the sum will be NA
. This addition is done for each row of the dataframe, so the result is a vector with the length nrow(test_df)
.
By using as.logical()
, the result of the sum calculated in the brackets is converted back into logical values. Again, this is done for each element of the vector. If the sum is zero, then the result is FALSE
, if it is NA
it will remain NA
. Any non-zero integer will be converted into TRUE
.
Upvotes: 1