Stephen Poole
Stephen Poole

Reputation: 391

How can I assign values to a mutated variable using outputs of if statements?

I'm trying to assign a value of either "Apple," "Banana" or "other" if there is no missing value in that row of the data frame. There should only be one non-missing value per row.

A <- c(1,NA,3,NA,NA,6,NA)
B <- c(NA,2,NA,4,5,NA,NA)
C <- c(NA,NA,NA,NA,NA,NA,7)

fruit_df <- tibble(A,B,C)

fruit_counts <- fruit_df %>%
  mutate(
    vendor = if(is.na(A) == FALSE) { "Apple" }
              else if (is.na(B) == FALSE) { "Banana" }
              else { "other"}
  )

I'm getting a syntax error and not sure what I'm doing wrong. Here is the error message:

 Warning message:
Problem with `mutate()` column `fruit_type`.
ℹ `fruit_type = if (...) NULL`.
ℹ the condition has length > 1 and only the first element will be used 

Originally I just wanted to carry the column name (i.e., A, B, C) to denote where the values weren't missing. In other words, something like this:

  # A tibble: 7 x 4
      A     B     C fruit_type
  <dbl> <dbl> <dbl> <chr>     
1     1    NA    NA A         
2    NA     2    NA B         
3     3    NA    NA A         
4    NA     4    NA B         
5    NA     5    NA B         
6     6    NA    NA A         
7    NA    NA     7 C     

I wanted to be able to specify any number of columns by name and have a function search the rows and carry the column name into the new variable (fruit_type) where the value is not missing. I'm pretty sure I need apply functions for that. But honestly if it's too complex I'll type each out using if-else.

Upvotes: 1

Views: 448

Answers (4)

TarJae
TarJae

Reputation: 79311

Here is another tidyverse solution: Ideas are mainly from akrun (row3 and row6).

library(tidyverse)
fruit_df %>% 
    rename_with(~ c("Apple", "Banana", "other")) %>% 
    mutate(across(1:3, ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}'), .keep="unused") %>%
    unite(vendor, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
    bind_cols(fruit_df, .)

output:

  Apple Banana other vendor
  <dbl>  <dbl> <dbl> <chr> 
1     1     NA    NA Apple 
2    NA      2    NA Banana
3     3     NA    NA Apple 
4    NA      4    NA Banana
5    NA      5    NA Banana
6     6     NA    NA Apple 
7    NA     NA     7 other 

Upvotes: 1

Martin Gal
Martin Gal

Reputation: 16998

Let's take a look at your error message:

 Warning message:
Problem with `mutate()` column `fruit_type`.
ℹ `fruit_type = if (...) NULL`.
ℹ the condition has length > 1 and only the first element will be used 

First of all, it's actually a warning, so your code did do something but most likely not the thing you wanted it to. An error stops the function's execution.

In this case, your if-function looked like

if(is.na(A) == FALSE) { "Apple" }
  else (...)

Unfortunately this if-function isn't vectorized. So:

> is.na(A)
[1] FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE

is evaluated for every row of your data.frame and if takes only the first element of is.na(A). Therefore the condition

is.na(A) == FALSE

is equivalent to

is.na(A)[1] == FALSE

which is TRUE (since is.na(A)[1] is FALSE). This causes your code to return Apple for each row of your data.frame. Assuming you don't want this behaviour, you should use the vectorized functions ifelse or dplyr::if_else.

As mentioned by @andrew_reece one alternative could be using case_when:

library(dplyr)

fruit_df %>%
  mutate( 
    vendor = case_when(is.na(A) == FALSE ~ "Apple",
                       is.na(B) == FALSE ~ "Banana",
                       TRUE              ~  "other")
  )

which returns

# A tibble: 7 x 4
      A     B     C vendor
  <dbl> <dbl> <dbl> <chr> 
1     1    NA    NA Apple 
2    NA     2    NA Banana
3     3    NA    NA Apple 
4    NA     4    NA Banana
5    NA     5    NA Banana
6     6    NA    NA Apple 
7    NA    NA     7 other 

Upvotes: 4

andrew_reece
andrew_reece

Reputation: 21284

Use if_else or case_when:

fruit_df %>%
  mutate(vendor = if_else(!is.na(A), "Apple", if_else(!is.na(B), "Banana", "other")))

# A tibble: 7 x 4
      A     B     C vendor
  <dbl> <dbl> <dbl> <chr> 
1     1    NA    NA Apple 
2    NA     2    NA Banana
3     3    NA    NA Apple 
4    NA     4    NA Banana
5    NA     5    NA Banana
6     6    NA    NA Apple 
7    NA    NA     7 other 

If you need fruit_type as well as vendor (your post isn't clear), you can use rowwise():

fruit_df %>%
  rowwise() %>% 
  mutate(fruit_type = names(fruit_df)[!is.na(cur_data())],
         vendor = if_else(!is.na(A), "Apple", if_else(!is.na(B), "Banana", "other")))

# A tibble: 7 x 5
# Rowwise: 
      A     B     C fruit_type vendor
  <dbl> <dbl> <dbl> <chr>      <chr> 
1     1    NA    NA A          Apple 
2    NA     2    NA B          Banana
3     3    NA    NA A          Apple 
4    NA     4    NA B          Banana
5    NA     5    NA B          Banana
6     6    NA    NA A          Apple 
7    NA    NA     7 C          other 

Upvotes: 3

akrun
akrun

Reputation: 887951

Using apply in base R

fruit_df$fruit_type <- apply(fruit_df, 1, function(x) names(x)[!is.na(x)])

-output

fruit_df$fruit_type
[1] "A" "B" "A" "B" "B" "A" "C"

or with max.col in base R

names(fruit_df)[max.col(!is.na(fruit_df))]

or similar option in dplyr

library(dplyr)
fruit_df %>% 
    rename_with(~ c("Apple", "Banana", "other")) %>% 
    transmute(fruit_type = names(.)[max.col(!is.na(.))]) %>% 
    bind_cols(fruit_df, .)
# A tibble: 7 x 4
      A     B     C fruit_type
  <dbl> <dbl> <dbl> <chr>     
1     1    NA    NA Apple     
2    NA     2    NA Banana    
3     3    NA    NA Apple     
4    NA     4    NA Banana    
5    NA     5    NA Banana    
6     6    NA    NA Apple     
7    NA    NA     7 other     

Upvotes: 4

Related Questions