Reputation: 31
First off I would like to apologize for my basic question. I am sure that if I was an experienced user the other threads on this topic would have been satisfactory, but I couldn't manage even after reading them. So if this might annoy you your welcome to ignore.
For those still wanting to help: I am trying to create a 5-way venn diagram. My data is arranged in excel as 5 columns (each representing a site A-E) and rows each representing a species abundance (0 - 16) for each of the five sites.
I want to create a nice venn diagram similar to this: https://i.sstatic.net/TeRSJ.png
I am sure its probobly only takes a few clicks. but I cant manage to: Load my data in the write way - which format should it be? dataset? list? matrix?
I think R seemed to suggest i can only use presence absence data (0/1) is that right?
eventually I figure I would use this command with x as my data
venn(x, snames = c(""), ilabels = FALSE, counts = FALSE, zcolor = c("bw"),
transparency = 0.3, ellipse = FALSE, size = 15, cexil = 0.45, cexsn = 0.85,
...)
Can anyone show me what code to use ? I can also upload my dataset if someone tells me how to do that here.
Thanks in advance
Upvotes: 0
Views: 9539
Reputation: 50728
Disclaimer 1: I'm not sure if your question is about how to calculate the counts per subgroup, or how to plot a 5-set Venn diagram. I'm assuming the latter.
Disclaimer 2: I find 5-set Venn diagrams extremely difficult to read. To the point of being useless. But that's my personal opinion.
If other R packages are an option, here is a worked-out 5-set example using VennDiagram
(straight from the VennDiagram reference manual)
library(VennDiagram);
venn.plot <- draw.quintuple.venn(
area1 = 301, area2 = 321, area3 = 311, area4 = 321, area5 = 301,
n12 = 188, n13 = 191, n14 = 184, n15 = 177,
n23 = 194, n24 = 197, n25 = 190,
n34 = 190, n35 = 173, n45 = 186,
n123 = 112, n124 = 108, n125 = 108,
n134 = 111, n135 = 104, n145 = 104,
n234 = 111, n235 = 107, n245 = 110,
n345 = 100,
n1234 = 61, n1235 = 60, n1245 = 59,
n1345 = 58, n2345 = 57,
n12345 = 31,
category = c("A", "B", "C", "D", "E"),
fill = c("dodgerblue", "goldenrod1", "darkorange1", "seagreen3", "orchid3"),
cat.col = c("dodgerblue", "goldenrod1", "darkorange1", "seagreen3", "orchid3"),
cat.cex = 2,
margin = 0.05,
cex = c(
1.5, 1.5, 1.5, 1.5, 1.5, 1, 0.8, 1, 0.8, 1, 0.8, 1, 0.8, 1, 0.8,
1, 0.55, 1, 0.55, 1, 0.55, 1, 0.55, 1, 0.55, 1, 1, 1, 1, 1, 1.5),
ind = TRUE);
png("venn_5set.png");
grid.draw(venn.plot);
dev.off();
Your source table is in an atypical format. As I explain in my comments, you usually start with either a binary matrix (one column per set, membership of every observation indicated by 0's or 1's), or a list of set elements.
To be honest, I'm less and less sure about what you are actually trying to do. I have a feeling that there might be a misconception about Venn diagrams. For example, let's take a look at the first rows of your table
# Read data
library(readxl);
data <- as.data.frame(read_excel("~/Downloads/dataset4venn.xlsx"));
rownames(data) <- data[, 1];
data <- data[, -1];
head(data);
# A B C D E
#1 8 8 7 8 10
#2 0 0 10 0 2
#3 0 0 0 0 3
#4 0 0 1 2 0
#5 1 0 1 0 2
#6 0 0 0 0 1
An observation is either the presence (encoded by 1
) or the absence (encoded by 0
) of a unique element (in your case a species) in a specific group (i.e. a sampling site). The number of sightings as you call it does not matter here: a Venn diagram explores the logical relations between different species sampled at different sites, or in other words which unique species are shared by sites A-E.
Having said that and ignoring the number of sightings per site, you can show overlaps between different sites in the following 5-set Venn diagram. I first define a helper function cts
to calculate counts per group/overlap, and then feed those numbers into draw.quintuple.venn
.
# Function to calculate the count per group/overlap
# Note: data is a global variable
cts <- function(set) {
df <- data;
for (i in 1:length(set)) df <- subset(df, df[set[i]] >= 1);
nrow(df);
}
# Plot
library(VennDiagram);
venn.plot <- draw.quintuple.venn(
area1 = cts("A"), area2 = cts("B"), area3 = cts("C"),
area4 = cts("D"), area5 = cts("E"),
n12 = cts(c("A", "B")), n13 = cts(c("A", "C")), n14 = cts(c("A", "D")),
n15 = cts(c("A", "E")), n23 = cts(c("B", "C")), n24 = cts(c("B", "D")),
n25 = cts(c("B", "E")), n34 = cts(c("C", "D")), n35 = cts(c("C", "E")),
n45 = cts(c("D", "E")),
n123 = cts(c("A", "B", "C")), n124 = cts(c("A", "B", "D")),
n125 = cts(c("A", "B", "E")), n134 = cts(c("A", "C", "D")),
n135 = cts(c("A", "C", "E")), n145 = cts(c("A", "D", "E")),
n234 = cts(c("B", "C", "D")), n235 = cts(c("B", "C", "E")),
n245 = cts(c("B", "D", "E")), n345 = cts(c("C", "D", "E")),
n1234 = cts(c("A", "B", "C", "D")), n1235 = cts(c("A", "B", "C", "E")),
n1245 = cts(c("A", "B", "D", "E")), n1345 = cts(c("A", "C", "D", "E")),
n2345 = cts(c("B", "C", "D", "E")),
n12345 = cts(c("A", "B", "C", "D", "E")),
category = c("A", "B", "C", "D", "E"),
fill = c("dodgerblue", "goldenrod1", "darkorange1", "seagreen3", "orchid3"),
cat.col = c("dodgerblue", "goldenrod1", "darkorange1", "seagreen3", "orchid3"),
cat.cex = 2,
margin = 0.05,
cex = c(
1.5, 1.5, 1.5, 1.5, 1.5, 1, 0.8, 1, 0.8, 1, 0.8, 1, 0.8, 1, 0.8,
1, 0.55, 1, 0.55, 1, 0.55, 1, 0.55, 1, 0.55, 1, 1, 1, 1, 1, 1.5),
ind = TRUE);
png("venn_5set.png");
grid.draw(venn.plot);
dev.off();
Various R packages/internet sources offer helper functions to calculate overlaps based on e.g. a binary matrix or a list of set elements. For example, the R/Bioconductor package limma
offers a function limma::vennCounts
that calculates counts for all overlaps based on a binary matrix. So if you don't want to write your own function (like I did), you can also use those. Either way, in the case of more complex Venn diagrams, I would suggest to not calculate overlaps manually by hand, as it's easy to make a mistake (see your error message).
Upvotes: 0
Reputation: 31
Hi Maurtis I tried the script u posted. I caculatd the overlaps in excel and eventually got:
library(VennDiagram);
venn.plot <- draw.quintuple.venn(
area1 = 104, area2 = 120, area3 = 117, area4 = 158, area5 = 107,
n12 = 59, n13 = 39, n14 = 55, n15 = 41,
n23 = 48, n24 = 71, n25 = 48,
n34 = 53, n35 = 53, n45 = 62,
n123 = 30, n124 = 44, n125 = 35,
n134 = 34, n135 = 30, n145 = 38,
n234 = 42, n235 = 35, n245 = 44,
n345 = 40, n1234 = 28, n1235 = 25, n1245 = 33,
n1345 = 27, n2345 = 32,
n12345 = 24,
category = c("A", "B", "C", "D", "E"),
fill = c("dodgerblue", "goldenrod1", "darkorange1", "seagreen3", "orchid3"),
cat.col = c("dodgerblue", "goldenrod1", "darkorange1", "seagreen3", "orchid3"),
cat.cex = 2,
margin = 0.05,
cex = c(
1.5, 1.5, 1.5, 1.5, 1.5, 1, 0.8, 1, 0.8, 1, 0.8, 1, 0.8, 1, 0.8,
1, 0.55, 1, 0.55, 1, 0.55, 1, 0.55, 1, 0.55, 1, 1, 1, 1, 1, 1.5),
ind = TRUE);
png("venn_5set.png");
grid.draw(venn.plot);
dev.off();
but I got an error:
error in draw.quintuple.venn(area1 = 104, area2 = 120, area3 = 117, area4 = 158, :Impossible: a17 <- n135 - a27 - a29 - a31 produces negative area
Which is a17?
Upvotes: 0