beginner
beginner

Reputation: 1069

Problem with upset plot intersection numbers

I have four sets A, B, C and D like below:

A <- c("ENSG00000103472", "ENSG00000130600", "ENSG00000177335", "ENSG00000177337", 
"ENSG00000178977", "ENSG00000180139", "ENSG00000180539", "ENSG00000187621", 
"ENSG00000188511", "ENSG00000197099", "ENSG00000203446", "ENSG00000203739", 
"ENSG00000203804", "ENSG00000204261", "ENSG00000204282", "ENSG00000204584", 
"ENSG00000205056", "ENSG00000205837", "ENSG00000206337", "ENSG00000213057")

B <- c("ENSG00000146521", "ENSG00000165511", "ENSG00000174171", "ENSG00000176659", 
"ENSG00000179428", "ENSG00000179840", "ENSG00000180539", "ENSG00000204261", 
"ENSG00000204282", "ENSG00000204949", "ENSG00000206337", "ENSG00000223534", 
"ENSG00000223552", "ENSG00000223725", "ENSG00000226252", "ENSG00000226751", 
"ENSG00000226777", "ENSG00000227066", "ENSG00000227260", "ENSG00000227403")

C <- c("ENSG00000167912", "ENSG00000168405", "ENSG00000172965", "ENSG00000177234", 
"ENSG00000177699", "ENSG00000177822", "ENSG00000179428", "ENSG00000179840", 
"ENSG00000180139", "ENSG00000181800", "ENSG00000181908", "ENSG00000183674", 
"ENSG00000189238", "ENSG00000196668", "ENSG00000196979", "ENSG00000197301", 
"ENSG00000203446", "ENSG00000203999", "ENSG00000204261", "ENSG00000206337")

D <- c("ENSG00000122043", "ENSG00000162888", "ENSG00000167912", "ENSG00000176320", 
"ENSG00000177699", "ENSG00000179253", "ENSG00000179428", "ENSG00000179840", 
"ENSG00000180539", "ENSG00000181800", "ENSG00000185433", "ENSG00000188511", 
"ENSG00000189238", "ENSG00000197301", "ENSG00000205056", "ENSG00000205562", 
"ENSG00000213279", "ENSG00000214922", "ENSG00000215533", "ENSG00000218018")

An upset plot gave me following result:

library(UpSetR)
mine <- list("A" = A,
             "B" = B,
             "C" = C,
             "D" = D)

upset(fromList(mine), keep.order = TRUE)

enter image description here

But I'm interested in looking at intersections between specific sets. A & B, A & C, A & D. So, I did it like below:

upset(fromList(mine), intersections = list(list("A"),list("B"),list("C"),
                                           list("D"),list("A", "B"), 
                                           list("A", "C"),
                                           list("A", "D")), keep.order = TRUE)

enter image description here

But, the common between A & B are 4, A & C are 4 and A & D are 3. Why the above upset plot show wrong numbers?

How to make it right showing correct common number? I don't want the common between all sets.

Upvotes: 3

Views: 2461

Answers (1)

TarJae
TarJae

Reputation: 78927

The numbers are correct! The issue is very specific and complex.

There are different ways to calculate set intersection size:

  1. "distinct" mode
  2. "intersect" mode
  3. "union" mode

UpSetR uses the "distinct" mode.

The "intersect" mode may be what the user expects.

ComplexHeatmap and ComplexUpset packages allows the user to choose which mode to use.

I found a real sufficient explanation by Jakob Rosenthal here https://github.com/hms-dbmi/UpSetR/issues/72 especially this graphic:

enter image description here

Upvotes: 3

Related Questions