John Gagnon
John Gagnon

Reputation: 915

Accurate Venn diagrams using eulerr

I'm trying to use the eulerr package to create Venn diagrams. I have 2 lists that I would like to use to create the Venn diagram with. 1 of the lists is a subset of the first. Strangely, eulerr seems to think that there is one value present in list

b

that is unique to that subset. I can't seem to figure out which values it thinks are unique.

https://pastebin.com/J7tPcfAt

> length(a)
[1] 3278

> length(b)
[1] 1318

When I check overlap between the subsets I get the expected results:

> length(which(a %in% b))
[1] 1318

> length(which((b %in% a)))
[1] 1318

> length(which(!(b %in% a)))
[1] 0

> length(which(!(a %in% b)))
[1] 1960

But when I use eulerr to plot a Venn diagram I get:

library(eulerr)
fit <- euler(list("A" = a, "B" = b))
plot(fit, counts = TRUE)

enter image description here

Notably, the number of values that eulerr thinks are unique to A is one longer than what I get using

length(which(!(a %in b)))

Any help understanding this behavior would be greatly appreciated!

Upvotes: 3

Views: 1989

Answers (1)

f.lechleitner
f.lechleitner

Reputation: 3812

I found out what's causing this behaviour but I can't explain why. It's because there is a duplicate value in both a and b, and it's the same value.

> a[duplicated(a)]
[1] "Crybg3"
> b[duplicated(b)]
[1] "Crybg3"

If I remove this value from both vectors it works.

a1 <- a[!duplicated(a)]
b1 <- b[!duplicated(b)]

fit <- euler(list("A" = a1, "B" = b1))
plot(fit, counts = TRUE)

> fit
    original fitted residuals region_error
A       1960   1960         0            0
B          0      0         0            0
A&B     1317   1317         0            0

diag_error:  0 
stress:      0 

enter image description here

Upvotes: 3

Related Questions