Reputation: 391
I'm trying to assign a value of either "Apple," "Banana" or "other" if there is no missing value in that row of the data frame. There should only be one non-missing value per row.
A <- c(1,NA,3,NA,NA,6,NA)
B <- c(NA,2,NA,4,5,NA,NA)
C <- c(NA,NA,NA,NA,NA,NA,7)
fruit_df <- tibble(A,B,C)
fruit_counts <- fruit_df %>%
mutate(
vendor = if(is.na(A) == FALSE) { "Apple" }
else if (is.na(B) == FALSE) { "Banana" }
else { "other"}
)
I'm getting a syntax error and not sure what I'm doing wrong. Here is the error message:
Warning message:
Problem with `mutate()` column `fruit_type`.
ℹ `fruit_type = if (...) NULL`.
ℹ the condition has length > 1 and only the first element will be used
Originally I just wanted to carry the column name (i.e., A, B, C) to denote where the values weren't missing. In other words, something like this:
# A tibble: 7 x 4
A B C fruit_type
<dbl> <dbl> <dbl> <chr>
1 1 NA NA A
2 NA 2 NA B
3 3 NA NA A
4 NA 4 NA B
5 NA 5 NA B
6 6 NA NA A
7 NA NA 7 C
I wanted to be able to specify any number of columns by name and have a function search the rows and carry the column name into the new variable (fruit_type) where the value is not missing. I'm pretty sure I need apply functions for that. But honestly if it's too complex I'll type each out using if-else.
Upvotes: 1
Views: 448
Reputation: 79311
Here is another tidyverse solution: Ideas are mainly from akrun (row3 and row6).
library(tidyverse)
fruit_df %>%
rename_with(~ c("Apple", "Banana", "other")) %>%
mutate(across(1:3, ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}'), .keep="unused") %>%
unite(vendor, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
bind_cols(fruit_df, .)
output:
Apple Banana other vendor
<dbl> <dbl> <dbl> <chr>
1 1 NA NA Apple
2 NA 2 NA Banana
3 3 NA NA Apple
4 NA 4 NA Banana
5 NA 5 NA Banana
6 6 NA NA Apple
7 NA NA 7 other
Upvotes: 1
Reputation: 16998
Let's take a look at your error message:
Warning message:
Problem with `mutate()` column `fruit_type`.
ℹ `fruit_type = if (...) NULL`.
ℹ the condition has length > 1 and only the first element will be used
First of all, it's actually a warning, so your code did do something but most likely not the thing you wanted it to. An error stops the function's execution.
In this case, your if
-function looked like
if(is.na(A) == FALSE) { "Apple" }
else (...)
Unfortunately this if
-function isn't vectorized. So:
> is.na(A)
[1] FALSE TRUE FALSE TRUE TRUE FALSE TRUE
is evaluated for every row of your data.frame and if
takes only the first element of is.na(A)
. Therefore the condition
is.na(A) == FALSE
is equivalent to
is.na(A)[1] == FALSE
which is TRUE
(since is.na(A)[1]
is FALSE
). This causes your code to return Apple
for each row of your data.frame. Assuming you don't want this behaviour, you should use the vectorized functions ifelse
or dplyr::if_else
.
As mentioned by @andrew_reece one alternative could be using case_when
:
library(dplyr)
fruit_df %>%
mutate(
vendor = case_when(is.na(A) == FALSE ~ "Apple",
is.na(B) == FALSE ~ "Banana",
TRUE ~ "other")
)
which returns
# A tibble: 7 x 4
A B C vendor
<dbl> <dbl> <dbl> <chr>
1 1 NA NA Apple
2 NA 2 NA Banana
3 3 NA NA Apple
4 NA 4 NA Banana
5 NA 5 NA Banana
6 6 NA NA Apple
7 NA NA 7 other
Upvotes: 4
Reputation: 21284
Use if_else
or case_when
:
fruit_df %>%
mutate(vendor = if_else(!is.na(A), "Apple", if_else(!is.na(B), "Banana", "other")))
# A tibble: 7 x 4
A B C vendor
<dbl> <dbl> <dbl> <chr>
1 1 NA NA Apple
2 NA 2 NA Banana
3 3 NA NA Apple
4 NA 4 NA Banana
5 NA 5 NA Banana
6 6 NA NA Apple
7 NA NA 7 other
If you need fruit_type
as well as vendor
(your post isn't clear), you can use rowwise()
:
fruit_df %>%
rowwise() %>%
mutate(fruit_type = names(fruit_df)[!is.na(cur_data())],
vendor = if_else(!is.na(A), "Apple", if_else(!is.na(B), "Banana", "other")))
# A tibble: 7 x 5
# Rowwise:
A B C fruit_type vendor
<dbl> <dbl> <dbl> <chr> <chr>
1 1 NA NA A Apple
2 NA 2 NA B Banana
3 3 NA NA A Apple
4 NA 4 NA B Banana
5 NA 5 NA B Banana
6 6 NA NA A Apple
7 NA NA 7 C other
Upvotes: 3
Reputation: 887951
Using apply
in base R
fruit_df$fruit_type <- apply(fruit_df, 1, function(x) names(x)[!is.na(x)])
-output
fruit_df$fruit_type
[1] "A" "B" "A" "B" "B" "A" "C"
or with max.col
in base R
names(fruit_df)[max.col(!is.na(fruit_df))]
or similar option in dplyr
library(dplyr)
fruit_df %>%
rename_with(~ c("Apple", "Banana", "other")) %>%
transmute(fruit_type = names(.)[max.col(!is.na(.))]) %>%
bind_cols(fruit_df, .)
# A tibble: 7 x 4
A B C fruit_type
<dbl> <dbl> <dbl> <chr>
1 1 NA NA Apple
2 NA 2 NA Banana
3 3 NA NA Apple
4 NA 4 NA Banana
5 NA 5 NA Banana
6 6 NA NA Apple
7 NA NA 7 other
Upvotes: 4