shu251
shu251

Reputation: 251

Fill column with unite() using mutate and case_when() statement in R, tidy verse

I have a list of names and assigned thresholds for those names to determine if the name I appropriate assigned.

You can recreate a test dataset using this:

df <- data.frame(level1 = c("Eukaryota","Eukaryota","Eukaryota","Eukaryota","Eukaryota"), 
             level2=c("Opisthokonta","Alveolata","Opisthokonta","Alveolata","Alveolata"), 
             level3=c("Fungi","Ciliophora","Fungi","Ciliophora","Dinoflagellata"),
             level4=c("Basidiomycota","Spirotrichea","Basidiomycota","Spirotrichea","Dinophyceae"), 
             value = c("100;5;4;2", "100;100;100;100", "100;80;60;50", "90;50;40;40","100;80;20;0"))

I'd like to use tidy verse mutate() and case_when() to find a taxonomic level that passes a suitable threshold. So the below tidy verse statement breaks up the threshold values and then attempts to do this. My bottle necks

  1. Using case_when() versus an ifelse() statement - it may be more appropriate to use ifelse()??
  2. I can't figure out how to fill the new column called Name_updated with a concatenated level1-levelX. Right now, unite() is not appropriate, as this has to do with whole datasets. In reality I have a lot more columns, so doing this without the tidy verse level1:level3 syntax would be painful!
df_updated <- df %>% 
  separate(value, c("threshold1","threshold2", "threshold3", "threshold4"), sep =";") %>% 
  mutate(Name_updated = case_when(
    threshold4 >= 50 ~ unite(level1:level4, sep = ";"), #Fill with all taxonomic names to level4
    threshold4 < 50 & threshold3 >= 60 ~ unite(level1:level3, sep = ";"), #If last threshold is <50, only fill with taxonomic names to level3
    threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ unite(level1:level2, sep = ";"), #If thresholds for level 3 and 4 are below, fill only level1;level2
    TRUE ~ level1)) %>% #Otherwise fill with only level 1
  data.frame

Desired output

> df_updated$Name_updated
# Output of this new list:
Eukaryota
Eukaryota;Alveolata;Ciliophora;Spirotrichea
Eukaryota;Opisthokonta;Fungi;Basidiomycota
Eukaryota;Alveolata
Eukaryota;Alveolata

A desired next step is to write a function that allows the user to specify the threshold values that are used in the script. So I really need to make the probing/determining what threshold passes robust.

Upvotes: 1

Views: 755

Answers (1)

akrun
akrun

Reputation: 887691

The issue is with unite and also the type of the separateed column. By default, convert = FALSE and it would be a character class column

library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
df %>% 
  type.convert(as.is = TRUE) %>%
  separate(value, c("threshold1","threshold2", 
          "threshold3", "threshold4"), sep =";", convert = TRUE) %>% 
  mutate(Name_updated = 
     case_when(
      threshold4 >= 50 ~
         select(., starts_with('level')) %>% 
            reduce(str_c, sep=";"),
       threshold4 < 50 & threshold3 >= 60 ~ 
          select(., level1:level3) %>%
            reduce(str_c, sep=";"), 
       threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ 
          select(., level1:level2) %>% 
            reduce(str_c, sep=";"), 
      TRUE ~ level1))
#  level1       level2         level3        level4 threshold1 threshold2 threshold3 threshold4
#1 Eukaryota Opisthokonta          Fungi Basidiomycota        100          5          4          2
#2 Eukaryota    Alveolata     Ciliophora  Spirotrichea        100        100        100        100
#3 Eukaryota Opisthokonta          Fungi Basidiomycota        100         80         60         50
#4 Eukaryota    Alveolata     Ciliophora  Spirotrichea         90         50         40         40
#5 Eukaryota    Alveolata Dinoflagellata   Dinophyceae        100         80         20          0
#                                 Name_updated
#1                                   Eukaryota
#2 Eukaryota;Alveolata;Ciliophora;Spirotrichea
#3  Eukaryota;Opisthokonta;Fungi;Basidiomycota
#4                         Eukaryota;Alveolata
#5                         Eukaryota;Alveolata

Upvotes: 0

Related Questions