Reputation: 3555
I have tables with values for Venn Diagrams which I am trying to read into R and parse in order to plot with the VennDiagram package. My tables look like this:
H3K27AC.bed H3K4ME3.bed gencode.bed Total Name
X 19184 gencode.bed
X 6843 H3K4ME3.bed
X X 3942 H3K4ME3.bed|gencode.bed
X 5097 H3K27AC.bed
X X 1262 H3K27AC.bed|gencode.bed
X X 4208 H3K27AC.bed|H3K4ME3.bed
X X X 9222 H3K27AC.bed|H3K4ME3.bed|gencode.bed
I can read the table in as a dataframe like this:
> venn_table_df<-read.table(venn_table_file,header = TRUE,sep = "\t",stringsAsFactors = FALSE)
> venn_table_df
H3K27AC.bed H3K4ME3.bed gencode.bed Total Name
1 X 19184 gencode.bed
2 X 6843 H3K4ME3.bed
3 X X 3942 H3K4ME3.bed|gencode.bed
4 X 5097 H3K27AC.bed
5 X X 1262 H3K27AC.bed|gencode.bed
6 X X 4208 H3K27AC.bed|H3K4ME3.bed
7 X X X 9222 H3K27AC.bed|H3K4ME3.bed|gencode.bed
I can get the categories for the venn diagram from the table like this
> venn_categories<-colnames(venn_table_df)[!colnames(venn_table_df) %in% c("Total","Name")]
> venn_categories
[1] "H3K27AC.bed" "H3K4ME3.bed" "gencode.bed"
And I can even make a summary table that is a bit easier to read, like this:
> venn_summary<-venn_table_df[!colnames(venn_table_df) %in% venn_categories]
> venn_summary
Total Name
1 19184 gencode.bed
2 6843 H3K4ME3.bed
3 3942 H3K4ME3.bed|gencode.bed
4 5097 H3K27AC.bed
5 1262 H3K27AC.bed|gencode.bed
6 4208 H3K27AC.bed|H3K4ME3.bed
7 9222 H3K27AC.bed|H3K4ME3.bed|gencode.bed
But what is stumping me is how to get the values out of the table and assign them correctly to the areas for the venn diagram. For reference, the triple venn function looks like this:
n1<-5097
n2<-6843
n3<-19184
n12<-4208
n13<-1262
n23<-3942
n123<-9222
venn <-draw.triple.venn(area1=n1+n12+n13+n123,
area2=n2+n23+n12+n123,
area3=n3+n23+n13+n123,
n12=n12+n123,
n13=n13+n123,
n23=n23+n123,
n123=n123,
category=venn_categories,
fill=c('red','blue','green'),
alpha=c(rep(0.3,3)))
But obviously this requires setting the values manually, which is not desirable since I have many of these data sets, and also need to scale it up to 4-way and 5-way Venn's. How can I get R to find the correct values for each field in the venn? I have tried multiple different methods using grep
, grepl
, and subsetting the dataframe for the rows that match the categories for each area of the plot, but this has not worked correctly. Any suggestions? BTW this data is output from the HOMER software package's mergePeaks program.
Upvotes: 1
Views: 1885
Reputation: 2628
In case someone finds this useful, there is now a very straightforward procedure to get these numbers into an approximately proportional Venn diagram. One of the ways to create a diagram with the nVennR package is from scratch. As explained in the vignette, the values for each region are entered in a particular order, which happens to be the same as in your table. The only difference is that nVennR expects one more value at the beginning, corresponding to the external region (this value should be 0, but anyway it will be ignored). This makes the procedure very easy:
> vt <- read.table('clipboard', header = T)
> vt
H3K27AC.bed H3K4ME3.bed gencode.bed Total Name
1 0 0 X 19184 gencode.bed
2 0 X 0 6843 H3K4ME3.bed
3 0 X X 3942 H3K4ME3.bed|gencode.bed
4 X 0 0 5097 H3K27AC.bed
5 X 0 X 1262 H3K27AC.bed|gencode.bed
6 X X 0 4208 H3K27AC.bed|H3K4ME3.bed
7 X X X 9222 H3K27AC.bed|H3K4ME3.bed|gencode.bed
> myV <- createVennObj(nSets = 3, sNames = c('H3K27Ac', 'H3K4ME3', 'gencode'), sSizes = c(0, vt$Total))
> vp <- plotVenn(nVennObj = myV)
And the result:
Another advantage of this procedure is that it is scalable to a larger number of groups.
Upvotes: 1
Reputation: 3555
I think I figured it out, using regular expressions to search the table for the correct entries for the plot. Here is the full workflow:
# load packages
library('VennDiagram')
library('gridExtra')
# read in the venn text
venn_table_df<-read.table(venn_table_file,header = TRUE,sep = "\t",stringsAsFactors = FALSE)
venn_table_df
looks like this:
> venn_table_df
H3K27AC.bed H3K4ME3.bed gencode.bed Total Name
1 X 19184 gencode.bed
2 X 6843 H3K4ME3.bed
3 X X 3942 H3K4ME3.bed|gencode.bed
4 X 5097 H3K27AC.bed
5 X X 1262 H3K27AC.bed|gencode.bed
6 X X 4208 H3K27AC.bed|H3K4ME3.bed
7 X X X 9222 H3K27AC.bed|H3K4ME3.bed|gencode.bed
> # recreate it with this btw
> dput(venn_table_df)
structure(list(H3K27AC.bed = c("", "", "", "X", "X", "X", "X"
), H3K4ME3.bed = c("", "X", "X", "", "", "X", "X"), gencode.bed = c("X",
"", "X", "", "X", "", "X"), Total = c(19184L, 6843L, 3942L, 5097L,
1262L, 4208L, 9222L), Name = c("gencode.bed", "H3K4ME3.bed",
"H3K4ME3.bed|gencode.bed", "H3K27AC.bed", "H3K27AC.bed|gencode.bed",
"H3K27AC.bed|H3K4ME3.bed", "H3K27AC.bed|H3K4ME3.bed|gencode.bed"
)), .Names = c("H3K27AC.bed", "H3K4ME3.bed", "gencode.bed", "Total",
"Name"), class = "data.frame", row.names = c(NA, -7L))
Then parse the table
# get the venn categories
venn_categories<-colnames(venn_table_df)[!colnames(venn_table_df) %in% c("Total","Name")]
# make a summary table
venn_summary<-venn_table_df[!colnames(venn_table_df) %in% venn_categories]
venn_summary
# get the areas for the venn; add up all the overlaps that contain the given category
# area1
area_n1<-sum(venn_summary[grep(pattern = paste0("(?=.*",venn_categories[1],")"),x = venn_summary$Name,perl = TRUE),][["Total"]])
# area2
area_n2<-sum(venn_summary[grep(pattern = paste0("(?=.*",venn_categories[2],")"),x = venn_summary$Name,perl = TRUE),][["Total"]])
# area3
area_n3<-sum(venn_summary[grep(pattern = paste0("(?=.*",venn_categories[3],")"),x = venn_summary$Name,perl = TRUE),][["Total"]])
# n12
area_n12<-sum(venn_summary[grep(pattern = paste0("(?=.*",venn_categories[1],")","(?=.*",venn_categories[2],")"),x = venn_summary$Name,perl = TRUE),][["Total"]])
# n13
area_n13<-sum(venn_summary[grep(pattern = paste0("(?=.*",venn_categories[1],")","(?=.*",venn_categories[3],")"),x = venn_summary$Name,perl = TRUE),][["Total"]])
# n23
area_n23<-sum(venn_summary[grep(pattern = paste0("(?=.*",venn_categories[2],")","(?=.*",venn_categories[3],")"),x = venn_summary$Name,perl = TRUE),][["Total"]])
# n123
area_n123<-sum(venn_summary[grep(pattern = paste0("(?=.*",venn_categories[1],")","(?=.*",venn_categories[2],")","(?=.*",venn_categories[3],")"),x = venn_summary$Name,perl = TRUE),][["Total"]])
venn <-draw.triple.venn(area1=area_n1,
area2=area_n2,
area3=area_n3,
n12=area_n12,
n13=area_n13,
n23=area_n23,
n123=area_n123,
category=venn_categories,
fill=c('red','blue','green'),
alpha=c(rep(0.3,3)))
The key was to use regular expressions to get only the table entries that include all of the categories for the venn area. This is a little more involved than I was hoping for, and will require manual setup to adapt to the four-way and five-way venns, but it works so far. I am open to other suggestions that might be able to simplify the process and scale up easier.
Upvotes: 1