R data frame compare factor levels

Question

I have the following dataframe:

df = data.frame(A=c("CLASS_3", "CLASS_3", "CLASS_1", "CLASS_0", "CLASS_2"), B=c("CLASS_0", "CLASS_1", "CLASS_1", "CLASS_0", "CLASS_3"), C=c("CLASS_0", "CLASS_0", "CLASS_2", "CLASS_0", "CLASS_2"), D=c("CLASS_3", "CLASS_4", "CLASS_2", "CLASS_0", "CLASS_2"),E=c("CLASS_4", "CLASS_4", "CLASS_1", "CLASS_1", "CLASS_2"), F=c("CLASS_3", "CLASS_2", "CLASS_1", "CLASS_0", "CLASS_2"))
row.names(df) <- c("gene1", "gene2", "gene3", "gene4", "gene5")

Every gene is classified into 5 factors CLASS_0 to CLASS_4 for 6 different conditions (A to F).

I want to check whether the CLASS changes from condition to condition, and I am interested in switches from CLASS_0 to CLASS_3 or CLASS_4 - therefore always two conditions/columns are compared. If there is a switch, I want to print the result into two new columns, SWITCH0->3 and SWITCH0->4.

This is my expected output:

Here, for gene1, there is a SWITCH0->3 from B to A, B to D, B to F, C to A, C to D, C to F, and a SWITCH0->4 from B to E and C to E.

Using dplyr, I get all rows that contain CLASS_0 and CLASS_4, but how do I construct the new column?

df %>% filter_all(any_vars(. %in% c('CAT_1'))) %>% filter_all(any_vars(. %in% c('CAT_3')))

UPDATE: I updated the data with three more cases:

There must not be any CLASS_0, CLASS_3 or CLASS_4 in a row (as in gene3)
There must not be any of CLASS_3 or CLASS_4 in a row (as in gene4)
There must not be any of CLASS_0 in a row (as in gene5).

Sinh Nguyen · Accepted Answer

Here is a way to do what you wanted using dplyr, tidyr, and purrr

Preparation

df = data.frame(A=c("CLASS_3", "CLASS_3"), B=c("CLASS_0", "CLASS_1"), C=c("CLASS_0", "CLASS_0"), D=c("CLASS_3", "CLASS_4"),E=c("CLASS_4", "CLASS_4"), F=c("CLASS_3", "CLASS_2"))
row.names(df) <- c("gene1", "gene2")

library(dplyr)
library(tidyr)
library(purrr)

# function to generate the string "origin-dest" combinations
generate_switch_string <- function(origin, dest) {
  paste(unlist(map(origin,
                   paste, sep = "-", 
                   dest)),
        collapse = ",")
}

# create column gene base on rowname
df <- df %>% mutate(gene = row.names(.))

Generate the switching combination string

combination_df <- df %>%  
  # create a long df for later use
  gather(key = class_name, value = class_value, A:F) %>%
  # only keep the class in interest here
  filter(class_value %in% c("CLASS_0", "CLASS_3", "CLASS_4")) %>%
  group_by(gene) %>%
  filter(any(class_value == "CLASS_0") & n_distinct(class_value) > 1) %>%
  # group the name of those class together
  group_by(gene, class_value) %>%
  summarize(class_names = list(class_name), .groups = "drop") %>%
  # generate the combination switch using the pre-defined function
  group_by(gene) %>%
  summarize("switch_0->3" = 
              generate_switch_string(
                unlist(class_names[class_value == "CLASS_0"]),
                unlist(class_names[class_value == "CLASS_3"])),
            "switch_0->4" = 
              generate_switch_string(
                unlist(class_names[class_value == "CLASS_0"]),
                unlist(class_names[class_value == "CLASS_4"])))
combination_df
#> # A tibble: 2 x 3
#>   gene  `switch_0->3`           `switch_0->4`
#>                               
#> 1 gene1 B-A,B-D,B-F,C-A,C-D,C-F B-E,C-E      
#> 2 gene2 C-A                     C-D,C-E

Merge the data back to original df

df %>% left_join(combination_df, by = "gene")
#>         A       B       C       D       E       F  gene             switch_0->3
#> 1 CLASS_3 CLASS_0 CLASS_0 CLASS_3 CLASS_4 CLASS_3 gene1 B-A,B-D,B-F,C-A,C-D,C-F
#> 2 CLASS_3 CLASS_1 CLASS_0 CLASS_4 CLASS_4 CLASS_2 gene2                     C-A
#> 3 CLASS_1 CLASS_1 CLASS_2 CLASS_2 CLASS_1 CLASS_1 gene3                    
#> 4 CLASS_0 CLASS_0 CLASS_0 CLASS_0 CLASS_1 CLASS_0 gene4                    
#> 5 CLASS_2 CLASS_3 CLASS_2 CLASS_2 CLASS_2 CLASS_2 gene5                    
#>   switch_0->4
#> 1     B-E,C-E
#> 2     C-D,C-E
#> 3        
#> 4        
#> 5

^{Created on 2022-01-05 by the reprex package (v2.0.1)}

R data frame compare factor levels

Answers (1)

Preparation

Generate the switching combination string

Merge the data back to original df

Related Questions