Reputation: 315
I'm making some comparisons with UpSetR, and I'd like to save the lists of elements that fall into each intersection. Is this possible? I can't find it anywhere...
It would be pretty tedious to do it manually (many lists), and since they're calculated anyway not being able to save them is frustrating
Upvotes: 10
Views: 8045
Reputation: 453
I like @Javier answer (which I actually used in my code, so thanks!). I needed to perform further work on that, and I found a quick and dirty way of doing the same
data <- data.frame(
entry = paste0("Entry.", 1:10),
"A" = c(0,0,0,0,1,0,1,1,0,0),
"B" = c(1,0,0,0,1,1,1,1,1,0),
"C" = c(1,1,1,1,0,0,1,0,1,1))
aggregate(data$Entry,by=list(data$A,data$B,data$C),FUN="length")
Upvotes: 0
Reputation: 121
Here is my solution. It is bio-related, but should be easy to translate to other fields. I start with a list of vectors. In my case a list of genes (list of characters) belonging to different signatures (different sets).
str(list_filter)
List of 9
$ CellAge_Induces : chr [1:153] "AAK1" "ABI3" "ADCK5" "AGT" ...
$ CellAge_Inhibits : chr [1:121] "ACLY" "AKR1B1" "ASPH" "ATF7IP" ...
$ CLASSICAL_SASP : chr [1:38] "BGN" "CCL2" "CCL20" "COL1A1" ...
$ FRIDMAN_SENESCENCE_UP : chr [1:77] "ALDH1A3" "CCND1" "CD44" "CDKN1A" ...
$ ISM_SCORE : chr [1:128] "HSH2D" "OTOF" "TRIM69" "PSME1" ...
$ MOSERLE_IFNA_RESPONSE : chr [1:31] "CD274" "CMPK2" "CXCL10" "DDX58" ...
$ REACTOME_SENESCENCE_SASP: chr [1:110] "ANAPC1" "ANAPC10" "ANAPC11" "ANAPC15" ...
$ SAEPHIA_CURATED_SASP : chr [1:38] "IL1A" "IL1B" "CXCL10" "CXCL1" ...
$ senmayo : chr [1:125] "ACVR1B" "ANG" "ANGPT1" "ANGPTL4" ...
From this list, I generate two tables: One with the unique gene names
df2 <- data.frame(gene=unique(unlist(list_filter)))
head(df2)
gene
1 AAK1
2 ABI3
3 ADCK5
4 AGT
5 AKT1
6 ALOX15B
dim(df2)
[1] 671 1
One is simply a "dataframe" version of the list. With every gene in the signature and the name of every signature (set).
df1 <- lapply(list_filter,function(x){
data.frame(gene = x)
}) %>%
bind_rows(.id = "path")
head(df1)
path gene
1 CellAge_Induces AAK1
2 CellAge_Induces ABI3
3 CellAge_Induces ADCK5
4 CellAge_Induces AGT
5 CellAge_Induces AKT1
6 CellAge_Induces ALOX15B
dim(df1)
[1] 821 2
now I iterate the search of each unique gene name and save the identity of the signatures in a column.
df_int <- lapply(df2$gene,function(x){
# pull the name of the intersections
intersection <- df1 %>%
dplyr::filter(gene==x) %>%
arrange(path) %>%
pull("path") %>%
paste0(collapse = "|")
# build the dataframe
data.frame(gene = x,int = intersection)
}) %>%
bind_rows()
head(df_int,n=20)
gene int
1 AAK1 CellAge_Induces
2 ABI3 CellAge_Induces
3 ADCK5 CellAge_Induces
4 AGT CellAge_Induces
5 AKT1 CellAge_Induces
6 ALOX15B CellAge_Induces
7 AR CellAge_Induces
8 ARPC1B CellAge_Induces
9 ASF1A CellAge_Induces
10 AXL CellAge_Induces|senmayo
11 BHLHE40 CellAge_Induces
12 BLK CellAge_Induces
13 BRAF CellAge_Induces
14 BRD7 CellAge_Induces
15 CAV1 CellAge_Induces
16 CCND1 CellAge_Induces|FRIDMAN_SENESCENCE_UP
17 CDK18 CellAge_Induces
18 CDKN1A CellAge_Induces|FRIDMAN_SENESCENCE_UP|REACTOME_SENESCENCE_SASP
19 CDKN1C CellAge_Induces|FRIDMAN_SENESCENCE_UP
20 CDKN1B CellAge_Induces|REACTOME_SENESCENCE_SASP
dim(df_int)
[1] 671 2
the dataframe can be summarised and compared to the output provided by calling
df_int %>%
group_by(int) %>%
summarise(n=n()) %>%
arrange(desc(n))
# A tibble: 47 × 2
int n
<chr> <int>
1 CellAge_Induces 126
2 CellAge_Inhibits 110
3 REACTOME_SENESCENCE_SASP 95
4 ISM_SCORE 93
5 senmayo 77
6 FRIDMAN_SENESCENCE_UP 44
7 ISM_SCORE|MOSERLE_IFNA_RESPONSE 27
8 CLASSICAL_SASP|senmayo 12
9 CLASSICAL_SASP 8
10 SAEPHIA_CURATED_SASP 8
# … with 37 more rows
# ℹ Use `print(n = ...)` to see more rows
upset(fromList(list_filter),nsets = 10)
Upvotes: 5
Reputation: 121
Here is my take at extracting the different intersections together with the list of elements in them.
The main idea is to paste all the 0's and 1's from the binary table to create unique identifiers for each intersection and them use the dplyr::group_by function to extract the information
data <- data.frame(
entry = paste0("Entry.", 1:10),
"A" = c(0,0,0,0,1,0,1,1,0,0),
"B" = c(1,0,0,0,1,1,1,1,1,0),
"C" = c(1,1,1,1,0,0,1,0,1,1)
)
# NOT REQUIRED. Only to confirm that upset works with these data
upset(data)
You can then identify the intersections by pasting all the binary columns. I use the unite convenience function for this:
NB: you may have to change this depending on whether your data has row names or a column with names
data_with_intersection <- data %>%
unite(col = "intersection", -c("entry"), sep = "")
From here, you can easily calculate the size of each intersection:
# Table of intersections and the number of entries
data_with_intersection %>%
group_by(intersection) %>%
summarise(n = n()) %>%
arrange(desc(n))
Or even extract the list of entries/elements in each intersection:
# List of intersections and their entries
data_with_intersection %>%
group_by(intersection) %>%
summarise(list = list(entry)) %>%
mutate(list = setNames(list, intersection)) %>%
pull(list)
Upvotes: 5
Reputation: 56149
There is no ready upSetR function for this (yet). But, it is possible to extract it:
library(UpSetR)
# Example input as list, expected output is 1 and 5:
listInput <- list(one = c(1, 2, 3, 5, 7, 8, 11, 12, 13),
two = c(1, 2, 4, 5, 10),
three = c(1, 5, 6, 7, 8, 9, 10, 12, 13))
When assigned upset returns a value, which also includes the data:
x <- upset(fromList(listInput))
x$New_data
# one two three
# 1 1 1 1
# 2 1 1 0
# 3 1 0 0
# 4 1 1 1
# 5 1 0 1
# 6 1 0 1
# 7 1 0 0
# 8 1 0 1
# 9 1 0 1
# 10 0 1 0
# 11 0 1 1
# 12 0 0 1
# 13 0 0 1
From here we can see it is 1st and the 4th rows are found in all three sets. The order of items are defined based on the order they appear in the list, see:
x1 <- unlist(listInput, use.names = FALSE)
x1 <- x1[ !duplicated(x1) ]
x1
# [1] 1 2 3 5 7 8 11 12 13 4 10 6 9
Now we know the rownumbers from "New_data" refer to in our list. So, as we have 3 columns, filter rows where sum is 3:
x1[ rowSums(x$New_data) == 3 ]
# [1] 1 5
Or we could just use Reduce:
Reduce(intersect, listInput)
# [1] 1 5
Upvotes: 3