littleworth
littleworth

Reputation: 5169

How to apply an operation in two nested column using dplyr

I have the following data

dat <- structure(list(motif = "JUND", celltype_specific_genes = list(
    structure(list(genes = c("BDNF", "IFI202B", "JUN"), tissue = c("P-XXX", 
    "P-XXX", "P-XXX")), .Names = c("genes", "tissue"), row.names = c(NA, 
    -3L), class = c("tbl_df", "tbl", "data.frame"))), ipa_motif_genes = list(
    structure(list(genes = c("BCL3", "BDNF", "CCND1", "CDKN2A", 
    "CYBB", "DUSP1", "HMOX1", "IFNG", "IFI202B", "JUN", "JUNB", 
    "MMP9", "NOX4", "SAT1", "SOCS1", "TBX21", "VEGFA")), .Names = "genes", row.names = c(NA, 
    -17L), class = c("tbl_df", "tbl", "data.frame")))), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -1L), .Names = c("motif", 
"celltype_specific_genes", "ipa_motif_genes"))

library(dplyr)
dat 
#> # A tibble: 1 x 3
#>   motif celltype_specific_genes ipa_motif_genes  
#>   <chr> <list>                  <list>           
#> 1 JUND  <tibble [3 × 2]>        <tibble [17 × 1]>

In reality I have more rows.

The nested column contains the following vector

celltype_specific_genes <- c("BDNF", "IFI202B", "JUN")
ipa_motif_genes <- c("JUN", "BDNF", "CCND1", "CDKN2A", 
        "CYBB", "DUSP1", "HMOX1", "IFNG", "IFI202B", "JUN", "JUNB", 
        "MMP9", "NOX4", "SAT1", "SOCS1", "TBX21", "VEGFA")
setdiff(ipa_motif_genes, celltype_specific_genes)
 #[1] "BCL3"   "CCND1"  "CDKN2A" "CYBB"   "DUSP1"  "HMOX1"  "IFNG"   "JUNB"   "MMP9"   "NOX4"   "SAT1"   "SOCS1"  "TBX21"  "VEGFA" 

What I want to do using the dplyr pipe is to add new column where it contain the difference between celltype_specific_genes and ipa_motif_genes also nested.

How can I achieve that?


Update

And I have another vector not in dat.

full_genes <- c("JUN", "TRAPPC3", "SLC12A6", "IGBP1", "M6PR", "GM829", "APC", "HSD17B12", "CD59B", "OSTM1", "SLC10A6", "AKAP8", "CRP", "GHITM", "1110065P20RIK", "GM29685", "DSCAML1", "SNX15", "ZFP385C", "DNAJC25", "CORIN", "NUDT22", "MAP1A", "CHMP2A", "SDR16C5", "ADRA1D", "UPP2", "GM13242", "PLXNB2", "ABI1", "CACNB3", "MILL2", "DAPK3", "SPTA1", "ADNP", "H2AFX", "SLC22A14", "CIC", "PHACTR3", "2010107G12RIK", "KLC3", "SUSD4", "SLC25A15", "PTPRT", "RTEL1", "KCNU1", "SMIM13", "OLFR207", "SAMD4B", "SPIC")

How can I add another column that get difference between full_genes with celltype_specific_genes?

I tried this but wont' do

Diff2 = map2(celltype_specific_genes, ~ tibble(setdiff(full_genes, .x$genes)))

Upvotes: 2

Views: 113

Answers (1)

akrun
akrun

Reputation: 886938

We can use map2 to loop through the list columns and get the elements that are in 'motif_genes' not present in 'cell_type_specific_genes'

dat %>%
   mutate(Diff = map2(celltype_specific_genes, ipa_motif_genes, 
                ~ tibble(setdiff(.y$genes, .x$genes)))) 
# A tibble: 1 x 4
#   motif celltype_specific_genes ipa_motif_genes   Diff             
#  <chr> <list>                  <list>            <list>           
#1 JUND  <tibble [3 x 2]>        <tibble [17 x 1]> <tibble [14 x 1]>

For the second case of comparison between an external vector with the column in the dataset

dat %>% 
   mutate(Diff = map(celltype_specific_genes, ~ tibble(setdiff(full_genes, .x$genes))))
# A tibble: 1 x 4
#   motif celltype_specific_genes ipa_motif_genes   Diff             
#  <chr> <list>                  <list>            <list>           
#1 JUND  <tibble [3 x 2]>        <tibble [17 x 1]> <tibble [49 x 1]>

Upvotes: 2

Related Questions