Reputation: 5169
I have the following data
dat <- structure(list(motif = "JUND", celltype_specific_genes = list(
structure(list(genes = c("BDNF", "IFI202B", "JUN"), tissue = c("P-XXX",
"P-XXX", "P-XXX")), .Names = c("genes", "tissue"), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))), ipa_motif_genes = list(
structure(list(genes = c("BCL3", "BDNF", "CCND1", "CDKN2A",
"CYBB", "DUSP1", "HMOX1", "IFNG", "IFI202B", "JUN", "JUNB",
"MMP9", "NOX4", "SAT1", "SOCS1", "TBX21", "VEGFA")), .Names = "genes", row.names = c(NA,
-17L), class = c("tbl_df", "tbl", "data.frame")))), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -1L), .Names = c("motif",
"celltype_specific_genes", "ipa_motif_genes"))
library(dplyr)
dat
#> # A tibble: 1 x 3
#> motif celltype_specific_genes ipa_motif_genes
#> <chr> <list> <list>
#> 1 JUND <tibble [3 × 2]> <tibble [17 × 1]>
In reality I have more rows.
The nested column contains the following vector
celltype_specific_genes <- c("BDNF", "IFI202B", "JUN")
ipa_motif_genes <- c("JUN", "BDNF", "CCND1", "CDKN2A",
"CYBB", "DUSP1", "HMOX1", "IFNG", "IFI202B", "JUN", "JUNB",
"MMP9", "NOX4", "SAT1", "SOCS1", "TBX21", "VEGFA")
setdiff(ipa_motif_genes, celltype_specific_genes)
#[1] "BCL3" "CCND1" "CDKN2A" "CYBB" "DUSP1" "HMOX1" "IFNG" "JUNB" "MMP9" "NOX4" "SAT1" "SOCS1" "TBX21" "VEGFA"
What I want to do using the dplyr pipe is to add new column where it contain the difference between celltype_specific_genes
and ipa_motif_genes
also nested.
How can I achieve that?
Update
And I have another vector not in dat
.
full_genes <- c("JUN", "TRAPPC3", "SLC12A6", "IGBP1", "M6PR", "GM829",
"APC", "HSD17B12", "CD59B", "OSTM1", "SLC10A6", "AKAP8", "CRP",
"GHITM", "1110065P20RIK", "GM29685", "DSCAML1", "SNX15", "ZFP385C",
"DNAJC25", "CORIN", "NUDT22", "MAP1A", "CHMP2A", "SDR16C5", "ADRA1D",
"UPP2", "GM13242", "PLXNB2", "ABI1", "CACNB3", "MILL2", "DAPK3",
"SPTA1", "ADNP", "H2AFX", "SLC22A14", "CIC", "PHACTR3", "2010107G12RIK",
"KLC3", "SUSD4", "SLC25A15", "PTPRT", "RTEL1", "KCNU1", "SMIM13",
"OLFR207", "SAMD4B", "SPIC")
How can I add another column that get difference between full_genes
with celltype_specific_genes
?
I tried this but wont' do
Diff2 = map2(celltype_specific_genes, ~ tibble(setdiff(full_genes, .x$genes)))
Upvotes: 2
Views: 113
Reputation: 886938
We can use map2
to loop through the list
columns and get the elements that are in 'motif_genes' not present in 'cell_type_specific_genes'
dat %>%
mutate(Diff = map2(celltype_specific_genes, ipa_motif_genes,
~ tibble(setdiff(.y$genes, .x$genes))))
# A tibble: 1 x 4
# motif celltype_specific_genes ipa_motif_genes Diff
# <chr> <list> <list> <list>
#1 JUND <tibble [3 x 2]> <tibble [17 x 1]> <tibble [14 x 1]>
For the second case of comparison between an external vector with the column in the dataset
dat %>%
mutate(Diff = map(celltype_specific_genes, ~ tibble(setdiff(full_genes, .x$genes))))
# A tibble: 1 x 4
# motif celltype_specific_genes ipa_motif_genes Diff
# <chr> <list> <list> <list>
#1 JUND <tibble [3 x 2]> <tibble [17 x 1]> <tibble [49 x 1]>
Upvotes: 2