Reputation: 385
This problem is hard to express in a title line, but it is a different question than the usual "how do I change the value in one column based on a value in another column." There are two wrinkles I haven't seen dealt with elsewhere: 1) I'm dealing with NAs, not values, 2) I'm dealing with a condition involving an OR relationship between multiple columns, i.e., if V01
is NA or V02
is NA, then make var foo
(which already exists with no NAs) NA for that row.
Here's a small tibble to illustrate:
tbl <- tibble(
V01 = 10:15,
V02 = 1:6,
V03 = V02 * 2,
foo = V03 * 2
)
# sprinkle around some NAs
tbl[1,2] = NA
tbl[4,1] = NA
tbl[4,2] = NA
tbl[6,1:3] = NA
that produces this table:
# A tibble: 6 x 4
V01 V02 V03 foo
<int> <int> <dbl> <dbl>
1 10 NA 2 4
2 11 2 4 8
3 12 3 6 12
4 NA NA 8 16
5 14 5 10 20
6 NA NA NA 24
Ultimately I want to produce this kind of logic for each row in tbl
:
if (V01 == NA | V02 == NA | V03 == NA) {foo = NA}
I started by trying to use mutate
to change foo
to NA based on one column, like this:
tbl <- tbl %>%
mutate(
foo = case_when(V01 == NA ~ NA)
)
But even in this simple case, mutate changed every value of foo
to NA:
# A tibble: 6 x 4
V01 V02 V03 foo
<int> <int> <dbl> <lgl>
1 10 NA 2 NA
2 11 2 4 NA
3 12 3 6 NA
4 NA NA 8 NA
5 14 5 10 NA
6 NA NA NA NA
I thought maybe I needed a rowwise
in there, but that didn't change the result.
Then I thought maybe it's because NAs are weird, so let's try something simpler: change foo
to -1 if V01
is NA:
tbl <- tbl %>%
mutate(
foo = case_when(V01 == NA ~ -1)
)
But that produced the same result, all values of foo
changed to NA (and the value of foo
in row 4 did not change to -1 either).
Then I decided to do something even simpler. Leave the NAs out of the picture. If V01
is 10 (as in the first row only), change the foo
value in that row to -1. So:
tbl <- tbl %>%
mutate(
foo = case_when(V01 == 10 ~ -1)
)
And that produced this:
# A tibble: 6 x 4
V01 V02 V03 foo
<int> <int> <dbl> <dbl>
1 10 NA 2 -1
2 11 2 4 NA
3 12 3 6 NA
4 NA NA 8 NA
5 14 5 10 NA
6 NA NA NA NA
So why is mutate
changing values in rows that don't meet the specified condition in the mutate
statement?
Note I haven't even gotten to the OR condition yet. For what it's worth, this is the result I want to get:
# A tibble: 6 x 4
V01 V02 V03 foo
<int> <int> <dbl> <dbl>
1 10 NA 2 NA
2 11 2 4 8
3 12 3 6 12
4 NA NA 8 NA
5 14 5 10 20
6 NA NA NA NA
I wasn't able to use the dplyr na_if
function because it only takes a value to replace with NA, not a condition.
Any guidance would be appreciated. Thanks!
Upvotes: 5
Views: 2495
Reputation: 9858
Main solution
You can use if_any()
and is.na
to check rowwise for any NAs. if_any()
obviates the need for either rowwise()
or reduce()
/ Reduce()
, which makes it ideal for row-wise logical opperations.
library(dplyr)
tbl%>%mutate(foo=replace(foo, if_any(matches("V\\d{2}"), is.na), NA))
# A tibble: 6 x 4
V01 V02 V03 foo
<int> <int> <dbl> <dbl>
1 10 NA 2 NA
2 11 2 4 8
3 12 3 6 12
4 NA NA 8 NA
5 14 5 10 20
6 NA NA NA NA
Secondary but possibly very fast solution with map()
and reduce()
:
library(dplyr)
library(purrr)
tbl%>%mutate(foo=ifelse(reduce(map(across(matches("V\\d{2}")), is.na), `|`), NA, foo))
if_any()
is a new and very nice and cleaner alternative to the usual approaches to logical rowwise opperations with rowSums(condition)
/ pmap()
+<-c(...)
/ rowwise()
+c_across()
/ map()
+Reduce()
/ apply(margin=1)
.
For interesting yet slightly different uses of this approach, see my answers in:
R - Remove rows from dataframe that contain only zeros in numeric columns, base R and pipe-friendly methods? and
Find the value of a column over a set of columns in a data frame in R
EDIT
There is an even cleaner solution with if_all()
:
tbl%>%mutate(foo=foo*if_all(everything()))
Upvotes: 1
Reputation: 21908
We can also use this beside all the very good answers you received:
library(dplyr)
library(purrr)
tbl %>%
mutate(foo = pmap_dbl(., ~ c(c(...)[length(tbl)], NA)[any(is.na(c(...)[-length(tbl)])) + 1]))
# A tibble: 6 x 4
V01 V02 V03 foo
<int> <int> <dbl> <dbl>
1 10 NA 2 NA
2 11 2 4 8
3 12 3 6 12
4 NA NA 8 NA
5 14 5 10 20
6 NA NA NA NA
Or this one also can be used:
tbl %>%
mutate(foo = pmap_dbl(., ~ ifelse(any(is.na(c(...)[-length(tbl)])), NA, ..4)))
Upvotes: 2
Reputation: 385
Thank you for all these alternative solutions. What works best for me is a variation on the reduce()
and map()
solution, plugging in the actual list variables to check, rather than doing a match
.
tbl <- tbl %>%
mutate(
foo=ifelse(reduce(map(across(c("V01", "V02", "V03")), is.na), `|`), NA, foo)
)
This is easiest since I'm working with an Election Studies dataset, which has 1300 variables, all starting with "V". Where this problem came from, if anyone is interested, is a byproduct of using the R psych
package to build psychometric scales from some sets of variables, such as 4 questions that measure 'anti-govt' attitudes. Psych
makes it very easy to create and test the reliability of a scale, but its default behavior is to give the scale variable a value even if all the input variables are NA. Given that the dataset has over 8,000 cases, I can afford the 10% or so decline in cases that results from assigning NA to the scale variable if any of the input variables are NA. A useful variant of this solution makes it easy to assign NA to the scale variable only if all the input variables are NA. Just change the |
operator to an &
. So this:
tbl <- tbl %>%
mutate(
foo=ifelse(reduce(map(across(c("V01", "V02", "V03")), is.na), `&`), NA, foo)
produces this:
# A tibble: 6 x 4
V01 V02 V03 foo
<int> <int> <dbl> <dbl>
1 10 NA 2 4
2 11 2 4 8
3 12 3 6 12
4 NA NA 8 16
5 14 5 10 20
6 NA NA NA NA
)
Another smaller side issue is the use of base R ifelse()
vs dplyr if_else()
. The latter doesn't work here because it requires both alternatives to be of the same type. This post is useful on the differences between the 2 functions.
Upvotes: -1
Reputation: 26218
Base R strategy
tbl$foo <- tbl$foo * as.logical(rowSums(tbl))
tbl
# A tibble: 6 x 4
V01 V02 V03 foo
<int> <int> <dbl> <dbl>
1 10 NA 2 NA
2 11 2 4 8
3 12 3 6 12
4 NA NA 8 NA
5 14 5 10 20
6 NA NA NA NA
using cur_data()
library(dplyr, warn.conflicts = F)
tbl %>% mutate(foo = pmin(foo, rowSums(cur_data())))
#> # A tibble: 6 x 4
#> V01 V02 V03 foo
#> <int> <int> <dbl> <dbl>
#> 1 10 NA 2 NA
#> 2 11 2 4 8
#> 3 12 3 6 12
#> 4 NA NA 8 NA
#> 5 14 5 10 20
#> 6 NA NA NA NA
Note: Assuming columns don't have negative values
If there are negative values, you may do
tbl %>% mutate(foo = foo * as.logical(rowSums(cur_data())))
# A tibble: 6 x 4
V01 V02 V03 foo
<int> <int> <dbl> <dbl>
1 10 NA 2 NA
2 11 2 4 8
3 12 3 6 12
4 NA NA 8 NA
5 14 5 10 20
6 NA NA NA NA
Upvotes: 2
Reputation: 7106
library(tidyverse)
tbl <- tibble(
V01 = 10:15,
V02 = 1:6,
V03 = V02 * 2,
foo = V03 * 2
)
# sprinkle around some NAs
tbl[1,2] = NA
tbl[4,1] = NA
tbl[4,2] = NA
tbl[6,1:3] = NA
#alternative faster way
criteria <- apply(tbl[, -4], 1, \(.x) .x) %>%
as.data.frame() %>%
map_lgl(~ any(is.na(.x)))
mutate(tbl,foo = ifelse(criteria, NA, foo))
#> # A tibble: 6 x 4
#> V01 V02 V03 foo
#> <int> <int> <dbl> <dbl>
#> 1 10 NA 2 NA
#> 2 11 2 4 8
#> 3 12 3 6 12
#> 4 NA NA 8 NA
#> 5 14 5 10 20
#> 6 NA NA NA NA
tbl %>% rowwise() %>%
mutate(foo = ifelse(any(is.na(c_across(matches('[0-9]$')))), NA, foo))
#> # A tibble: 6 x 4
#> # Rowwise:
#> V01 V02 V03 foo
#> <int> <int> <dbl> <dbl>
#> 1 10 NA 2 NA
#> 2 11 2 4 8
#> 3 12 3 6 12
#> 4 NA NA 8 NA
#> 5 14 5 10 20
#> 6 NA NA NA NA
#if foo has no NA's
tbl %>% rowwise() %>%
mutate(foo = ifelse(any(is.na(c_across(everything()))), NA, foo))
#> # A tibble: 6 x 4
#> # Rowwise:
#> V01 V02 V03 foo
#> <int> <int> <dbl> <dbl>
#> 1 10 NA 2 NA
#> 2 11 2 4 8
#> 3 12 3 6 12
#> 4 NA NA 8 NA
#> 5 14 5 10 20
#> 6 NA NA NA NA
Created on 2021-06-16 by the reprex package (v2.0.0)
Created on 2021-06-16 by the reprex package (v2.0.0)
Foo column can also be created while looking for NA's
library(tidyverse)
#no foo column
tbl_no_foo <- tibble(
V01 = 10:15,
V02 = 1:6,
V03 = V02 * 2
)
# sprinkle around some NAs
tbl_no_foo[1,2] = NA
tbl_no_foo[4,1] = NA
tbl_no_foo[4,2] = NA
tbl_no_foo[6,1:3] = NA
tbl_no_foo %>% rowwise() %>%
mutate(foo = ifelse(any(is.na(c_across(matches('[0-9]$')))), NA, V03 * 2))
#> # A tibble: 6 x 4
#> # Rowwise:
#> V01 V02 V03 foo
#> <int> <int> <dbl> <dbl>
#> 1 10 NA 2 NA
#> 2 11 2 4 8
#> 3 12 3 6 12
#> 4 NA NA 8 NA
#> 5 14 5 10 20
#> 6 NA NA NA NA
Created on 2021-06-16 by the reprex package (v2.0.0)
Upvotes: 0
Reputation: 388797
Here are couple of options -
rowSums
-library(dplyr)
tbl %>%
mutate(foo = replace(foo, rowSums(is.na(select(., starts_with('V')))) > 0, NA))
# V01 V02 V03 foo
# <int> <int> <dbl> <dbl>
#1 10 NA 2 NA
#2 11 2 4 8
#3 12 3 6 12
#4 NA NA 8 NA
#5 14 5 10 20
#6 NA NA NA NA
rowwise
-tbl %>%
rowwise() %>%
mutate(foo = if(any(is.na(c_across(starts_with('V'))))) NA else foo)
Both of this replace foo
with NA
if there is a NA
value in any of the columns starting with V
.
As far as this code is concerned
tbl <- tbl %>%
mutate(foo = case_when(V01 == 10 ~ -1))
case_when
by default returns NA
if no condition is satisfied. To get the same value back you can do
tbl %>%
mutate(foo = case_when(V01 == 10 ~ -1,
TRUE ~ foo))
Upvotes: 1