kin182
kin182

Reputation: 403

How to extract characters based on content of each row in a list?

I have a list like this:

list1 = list(data.frame("Gene" = c("A","B","C","D","E"), "Sample" = "S1"),
             data.frame("Gene" = c("B","C","D","F","G"), "Sample" = "S2"),
             data.frame("Gene" = c("A","C","D","E","F"), "Sample" = "S3"))

names(list1) = c("S1","S2","S3")

I would like to report which Sample are present for each Gene in the entire list1. For example

$A
"S1","S3"

$B
"S1","S2"

$C
"S1","S2","S3"

$D
"S1","S2","S3"

$E
"S1","S3"

$F
"S2","S3"

$G
"S2"

There are no duplicated Gene within the list, but there are common Gene between the list and for each Gene, I wanted to find out which Sample in which it is present. Could someone help? Thank you.

Upvotes: 1

Views: 40

Answers (3)

akrun
akrun

Reputation: 887203

We could use split from base R after rbinding the list elements

with(do.call(rbind, list1), split(Sample, Gene))
#$A
#[1] "S1" "S3"

#$B
#[1] "S1" "S2"

#$C
#[1] "S1" "S2" "S3"

#$D
#[1] "S1" "S2" "S3"

#$E
#[1] "S1" "S3"

#$F
#[1] "S2" "S3"

#$G
#[1] "S2"

Upvotes: 0

JasonAizkalns
JasonAizkalns

Reputation: 20463

If you would prefer the output in more of a tibble or data.frame format you can use:

library(tidyverse)

bind_rows(list1) %>%
  group_by(Gene) %>%
  summarise(Samples = toString(Sample))

#> # A tibble: 7 x 2
#>   Gene  Samples   
#>   <chr> <chr>     
#> 1 A     S1, S3    
#> 2 B     S1, S2    
#> 3 C     S1, S2, S3
#> 4 D     S1, S2, S3
#> 5 E     S1, S3    
#> 6 F     S2, S3    
#> 7 G     S2

Or you could nest them for further processing:

bind_rows(list1) %>%
  group_by(Gene) %>%
  nest()

#> # A tibble: 7 x 2
#>   Gene  data            
#>   <chr> <list>          
#> 1 A     <tibble [2 x 1]>
#> 2 B     <tibble [2 x 1]>
#> 3 C     <tibble [3 x 1]>
#> 4 D     <tibble [3 x 1]>
#> 5 E     <tibble [2 x 1]>
#> 6 F     <tibble [2 x 1]>
#> 7 G     <tibble [1 x 1]>

Upvotes: 0

Onyambu
Onyambu

Reputation: 79238

You can first use do.call(rbind,..) to make the list into one dataframe then unstack the dataframe:

unstack(do.call(rbind,list1),Sample~Gene)
$A
[1] "S1" "S3"

$B
[1] "S1" "S2"

$C
[1] "S1" "S2" "S3"

$D
[1] "S1" "S2" "S3"

$E
[1] "S1" "S3"

$F
[1] "S2" "S3"

$G
[1] "S2"

Upvotes: 2

Related Questions