Reputation: 5
First question ever for me! Hope I make it clear and simple :D
I want to add a new column which content if based on a condition based on all other columns content.
I have the following (kinda messy) data-frame and I'm trying to add a column that should contain "Yes" if one other column contains 0 or "No" if none of the other columns contains "0".
library(tidyverse)
library(readxl)
all_extract <- read_excel("all_extract.xlsx")
View(all_extract)
all_extract
A tibble: 25 x 14
tag1 `tag1 -1` tag2 `tag2 -1` tag3 `tag3 -1` tag4 `tag4 -1` tag5 `tag5 -1` tag6 `tag6 -1` tag7 `tag7-1`
<dbl> <lgl> <dbl> <lgl> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl>
1 2 NA 0 NA 1 NA 2 NA 1 -1 1 NA 1 0
2 2 NA 2 NA 1 NA 2 NA 2 NA 1 NA 2 0
3 1 NA 1 NA 1 NA 2 NA 1 NA 2 NA 1 0
4 1 NA 2 NA 1 NA 2 NA 1 NA 1 NA 1 0
5 2 NA 1 NA 1 NA 2 NA 1 NA 2 NA 1 0
6 2 NA 1 NA 1 NA 2 NA 0 NA 1 NA 2 0
7 1 NA 2 NA 1 NA 2 NA 0 NA 1 NA 1 0
8 1 NA 1 NA 2 NA 2 NA 1 NA 1 NA 1 0
9 1 NA 2 NA 1 NA 2 NA 1 NA 2 NA 1 0
10 1 NA 2 NA 1 NA 2 -1 1 NA 2 NA 1 0
# … with 15 more rows
For replication, file can be download with this link.
I've tried to do it with dplyr mutate
, but it works with one or several named columns, not with all columns; I need to write a script that I can apply to a bunch of files that have not the same number of columns. If my understanding is correct, mutate_all
and mutate_if
do not do what I want, as unlike mutate
they do not add a new column but change the content of the existing one that match the criteria. Yet, they are the only thing I can find when searching for help on the internet.
For instance, I can apply mutate
with a condition on one column:
all <- all_extract %>%
mutate(Final_schwa = case_when(syll_tag2==0 ~ 'Schwa',
TRUE ~ 'No schwa'))
I have find many solution for multiple columns, none for all columns.
I also tried to use across
within mutate
, but I don't know how to indicate all columns. Ideally, I thought I could use something like:
mutate(across(starts_with("tag"), "Yes"))
But I cannot managed to get the right syntax for it to work. Also, I'm not even sure across
is supposed to be able to do what I want.
So: is there a way to apply real mutate
(for adding a column) with a condition based on all other columns (for the new column to be filled with "Yes/No" content based on the all-columns condition)?
Thanks in advance if you could help!
Upvotes: 0
Views: 167
Reputation: 3257
pivot_longer
As Konrad has pointed out, this solution is superior to the one below because it preserves data types - if your data contains multiple types, they would be coerced into a single type by t
which could result in data loss.
new_col <-
all_extract %>%
pivot_longer(everything()) %>% #This will make a 2-col df w/ names: (name, value)
group_by(name) %>%
summarise(any_zero = any(value == 0))
rbind(all_extract, new_col["any_zero"])
summarise(across(everything))
I think this solution is a little more intuitive if you're a visual thinker, but there's a risk of data loss, because t
returns a matrix, and matrices can't contain multiple types.
new_col <-
all_extract %>%
t %>%
as_tibble %>% # (because t() returns a matrix)
summarise(across(everything(), function(x){
any(x == 0)
})) %>%
t # transposing again will coerce our single row df into a column vector
all_extract$new_col <- new_col
Upvotes: 2
Reputation: 389235
Do you want to add 'Yes' if there is atleast 1 zero in a row?
You can use rowSums
as :
all_extract$new_col <- ifelse(rowSums(all_extract == 0, na.rm = TRUE) > 0, 'yes', 'no')
#Without ifelse
all_extract$new_col <- c('No', 'Yes')[(rowSums(all_extract == 0, na.rm = TRUE) > 0) + 1]
Upvotes: 1