Stricken1
Stricken1

Reputation: 63

R "Error in draw.quad.venn, Impossible: produces negative area" despite numbers being correct

I'm trying to generate a four way Venn diagram using draw.quad.venn in the VennDiagram package in R, but it keeps throwing up the error message:

ERROR [2019-05-14 11:28:24] Impossible: a7  <- n234 - a6 produces negative area
Error in draw.quad.venn(length(gene_lists[[1]]), length(gene_lists[[2]]),  : 
  Impossible: a7  <- n234 - a6 produces negative area

I'm using 4 different lists of genes as the input. calculate.overlap works fine, then I get the numbers by using the length(x) function over the overlap values, parsed as a list. I pass all of the overlap values, along with the appropriate total group sizes, to the draw.quad.venn function, but it keeps claiming that one of the groups is impossible because it generates a negative number.

I've checked the numbers manually and they clearly add up to the correct values. I've also tested the script on a random set of 20000 genes, generated using something similar to the script below, and it works fine i.e. generates a four way Venn diagram. There are no differences between the randomly generated gene lists and the ones I've curated from actual results files, apart from their sizes. A minimal working example can be seen below:

# working example that fails
# get vector of 10000 elements (representative of gene list)
values <- c(1:10000)
# generate 4 subsets by random sampling
list_1 <- sample(values, size = 5000, replace = FALSE)
list_2 <- sample(values, size = 4000, replace = FALSE)
list_3 <- sample(values, size = 3000, replace = FALSE)
list_4 <- sample(values, size = 2000, replace = FALSE)
# compile them in to a list
lists <- list(list_1, list_2, list_3, list_4)
# find overlap between all possible combinations (11 plus 4 unique to each list = 15 total)
overlap <- calculate.overlap(lists)
# get the lengths of each list - these will be the numbers used for the Venn diagram
overlap_values <- lapply(overlap, function(x) length(x))
# rename overlap values (easier to identify which groups are intersecting)
names(overlap_values) <- c("n1234", "n123", "n124", "n134", "n234", "n12", "n13", "n14", "n23", "n24", "n34", "n1", "n2", "n3", "n4")
# generate the venn diagram
draw.quad.venn(length(lists[[1]]), length(lists[[2]]), length(lists[[3]]), length(lists[[4]]), overlap_values$n12,
               overlap_values$n13, overlap_values$n14, overlap_values$n23, overlap_values$n24, overlap_values$n34,
               overlap_values$n123, overlap_values$n124, overlap_values$n134, overlap_values$n234, overlap_values$n1234)

I expect a four way Venn diagram regardless of whether or not some groups are 0, they should still be there, but labelled as 0. This is what it should look like:

Four way Venn diagram, with gene list sizes in each group

I'm not sure if it's because I have 0 values in the real data i.e. certain groups where there is no overlap? Is there any way to force draw.quad.venn() to take any values? If not, is there another package that I can use to achieve the same results? Any help greatly appreciated!

Upvotes: 1

Views: 1613

Answers (2)

vqf
vqf

Reputation: 2628

I have had a look at the source code of the package. In case you are still interested in the reason for the error, there are two ways to send data to venn.diagram. One is the nxxxx (e. g., n134) form and the other is the an (e. g., a5) form. In the examples, n134 means "which elements belong at least to groups 1, 3 and 4". On the other hand, a5 means "which elements only belong to groups 1, 3 and 4". The relationship between both forms is really convoluted, for instance a6 corresponds to n1234. This means that n134 = a5 + a6. The problem is that calculate.overlap gives the numbers in the an form, whereas by default draw.quad.venn expects numbers in the nxxxx form. To use the values from calculate.overlap, you can set direct.area to true and provide the result of calculate.overlap in the area.vector parameter. For instance,

tmp <- calculate.overlap(list(a=c(1, 2, 3, 4, 10), b=c(3, 4, 5, 6), c=c(4, 6, 7, 8, 9), d=c(4, 8, 1, 9)))
overlap_values <- lapply(tmp, function(x) length(x))
draw.quad.venn(area.vector = c(overlap_values$a1, overlap_values$a2, overlap_values$a3, overlap_values$a4, 
                               overlap_values$a5, overlap_values$a6, overlap_values$a7, overlap_values$a8, 
                               overlap_values$a9, overlap_values$a10, overlap_values$a11, overlap_values$a12, 
                               overlap_values$a13, overlap_values$a14, overlap_values$a15), direct.area = T, category = c('a', 'b', 'c', 'd'))

vd2

If you are interested in something simpler and more flexible, I made the nVennR package for this type of problems:

library(nVennR)
g1 <- c('AF029684', 'M28825', 'M32074', 'NM_000139', 'NM_000173', 'NM_000208', 'NM_000316', 'NM_000318', 'NM_000450', 'NM_000539', 'NM_000587', 'NM_000593', 'NM_000638', 'NM_000655', 'NM_000789', 'NM_000873', 'NM_000955', 'NM_000956', 'NM_000958', 'NM_000959', 'NM_001060', 'NM_001078', 'NM_001495', 'NM_001627', 'NM_001710', 'NM_001716')
g2 <- c('NM_001728', 'NM_001835', 'NM_001877', 'NM_001954', 'NM_001992', 'NM_002001', 'NM_002160', 'NM_002162', 'NM_002258', 'NM_002262', 'NM_002303', 'NM_002332', 'NM_002346', 'NM_002347', 'NM_002349', 'NM_002432', 'NM_002644', 'NM_002659', 'NM_002997', 'NM_003032', 'NM_003246', 'NM_003247', 'NM_003248', 'NM_003259', 'NM_003332', 'NM_003383', 'NM_003734', 'NM_003830', 'NM_003890', 'NM_004106', 'AF029684', 'M28825', 'M32074', 'NM_000139', 'NM_000173', 'NM_000208', 'NM_000316', 'NM_000318', 'NM_000450', 'NM_000539')
g3 <- c('NM_000655', 'NM_000789', 'NM_004107', 'NM_004119', 'NM_004332', 'NM_004334', 'NM_004335', 'NM_004441', 'NM_004444', 'NM_004488', 'NM_004828', 'NM_005214', 'NM_005242', 'NM_005475', 'NM_005561', 'NM_005565', 'AF029684', 'M28825', 'M32074', 'NM_005567', 'NM_003734', 'NM_003830', 'NM_003890', 'NM_004106', 'AF029684', 'NM_005582', 'NM_005711', 'NM_005816', 'NM_005849', 'NM_005959', 'NM_006138', 'NM_006288', 'NM_006378', 'NM_006500', 'NM_006770', 'NM_012070', 'NM_012329', 'NM_013269', 'NM_016155', 'NM_018965', 'NM_021950', 'S69200', 'U01351', 'U08839', 'U59302')
g4 <- c('NM_001728', 'NM_001835', 'NM_001877', 'NM_001954', 'NM_005214', 'NM_005242', 'NM_005475', 'NM_005561', 'NM_005565', 'ex1', 'ex2', 'NM_003890', 'NM_004106', 'AF029684', 'M28825', 'M32074', 'NM_000139', 'NM_000173', 'NM_000208', 'NM_000316', 'NM_000318', 'NM_000450', 'NM_000539')
myV <- plotVenn(list(g1=g1, g2=g2, g3=g3, g4=g4))
myV <- plotVenn(nVennObj = myV)
myV <- plotVenn(nVennObj = myV)

The last command is repeated on purpose. The result: nvennr_vd2

You can then explore the intersections:

> getVennRegion(myV, c('g1', 'g2', 'g4'))
[1] "NM_000139" "NM_000173" "NM_000208" "NM_000316" "NM_000318" "NM_000450" "NM_000539"

There is a vignette with more information.

Upvotes: 1

Stricken1
Stricken1

Reputation: 63

So nothing I tried could solve the error with the draw.quad.venn in the VennDiagram package. There's something wrong with the way it's written. As long as all of the numbers in each of the 4 ellipses add up to the total number of elements in that particular list, the Venn diagram is valid. For some reason, VennDiagram will only accept data where fewer intersections lead to higher numbers e.g. the intersection of groups 1, 2 and 3 MUST be higher than the intersection of all 4 groups. This doesn't represent real world data. It's entirely possible for groups 1, 2 and 3 to not intersect at all, whilst all 4 groups do intersect. In a Venn diagram, all of the numbers are independent, and represent the total number of elements common at each intersection. They do not have to have any bearing on each other.

I had a look at the eulerr package, but actually found a very simple method of plotting the venn diagram using venn in gplots, as follows:

# simple 4 way Venn diagram using gplots
# get some mock data
values <- c(1:20000)
list_1 <- sample(values, size = 5000, replace = FALSE)
list_2 <- sample(values, size = 4000, replace = FALSE)
list_3 <- sample(values, size = 3000, replace = FALSE)
list_4 <- sample(values, size = 2000, replace = FALSE)
lists <- list(list_1, list_2, list_3, list_4)
# name thec list (required for gplots)
names(lists) <- c("G1", "G2", "G3", "G4")
# get the venn table
v.table <- venn(lists)
# show venn table
print(v.table)
# plot Venn diagram
plot(v.table)

I now consider the matter solved. Thank you zx8754 for your help!

Upvotes: 1

Related Questions