Reputation: 10604
I have some data and am trying to make boxplots with an overlay of jittered points. My issue is with the points, so we'll stick with that.
Here's the data:
> dput(test)
structure(list(var1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), .Label = c("A", "B", "C", "D",
"E", "F", "G", "H", "I"), class = "factor"), var2 = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7"), class = "factor"), response1 = c(5L,
6L, 5L, 5L, 5L, 5L, 4L, 6L, 6L, 5L, 5L, 6L, 6L, 4L, 1L, 1L, NA,
1L, NA, NA, 1L, 1L, 1L, NA, 1L, NA, NA, 1L, 5L, 5L, 4L, 5L, 3L,
2L, 3L, 1L, 1L, NA, 1L, NA, NA, 1L, NA, NA, 2L, NA, 3L, 1L, NA,
NA, NA, 4L, NA, 4L, 5L, NA, NA, NA, 1L, NA, 1L, 1L, NA), response2 = c(2L,
2L, 2L, 2L, 2L, 2L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 5L, 5L, NA,
5L, NA, NA, 5L, 5L, 5L, NA, 5L, NA, NA, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, NA, 5L, NA, NA, 5L, NA, NA, 5L, NA, 5L, 5L, NA,
NA, NA, 5L, NA, 5L, 5L, NA, NA, NA, 5L, NA, 5L, 5L, NA), response3 = c(4L,
5L, 1L, 1L, 4L, 1L, 1L, 4L, 5L, 1L, 1L, 5L, NA, 1L, 4L, NA, NA,
NA, 3L, 2L, NA, 4L, NA, NA, NA, 3L, NA, NA, 4L, NA, 1L, NA, 3L,
NA, 2L, 4L, NA, NA, NA, NA, NA, NA, NA, 2L, 1L, 1L, NA, NA, 1L,
NA, 3L, 1L, NA, NA, NA, 1L, NA, 3L, 1L, NA, NA, NA, 1L)), .Names = c("var1",
"var2", "response1", "response2", "response3"), class = "data.frame", row.names = c(NA,
-63L))
I used reshape2
to melt my data for faceting/simiplification of plot command:
library(reshape2)
test_melted <- melt(test, id.var = c("var1", "var2"), na.rm = T)
And here was the plot I created:
library(ggplot2)
p <- ggplot(test_melted, aes(x = var1, y = value)) + geom_point()
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p
That yields this:
Looked normal enough, but then I noticed it seemed like there were more points per facet/factor level than there should be. I narrowed down to just one level for var1
test_subset <- test_melted[test_melted$var1 == "E", ]
nrow(test_subset)
[1] 18
summary(test_subset)
var1 var2 variable value
E :18 V1:3 response1:7 Min. :1
A : 0 V2:2 response2:7 1st Qu.:3
B : 0 V3:3 response3:4 Median :5
C : 0 V4:2 Mean :4
D : 0 V5:3 3rd Qu.:5
F : 0 V6:2 Max. :5
(Other): 0 V7:3
So, we should have 18 total points plotted (7 for response1
, 7 for response2
, and 4 for response3
. Let's try it:
p <- ggplot(test_subset, aes(x = var1, y = value)) + geom_point()
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p
I count 11 points in the response1
facet, 8 in response2
, and 8 in response3
.
This has got to be something silly I'm missing. I've done plenty of faceting with dot plots and just never had this happen (or never noticed!).
Things I tried
coord_flip()
test_subset <- droplevels(test_subset)
in case empty factor levels were messing with somethingfacet_grid(~variable)
vs. facet_grid(.~variable)
vs. facet_grid(variable~)
vs. facet_grid(variable~.)
As one last note, I get different numbers of points depending on whether or not I facet. With faceting, I get 11 + 8 + 8 = 27
, if I remove the facet_grid(~variable)
, I get 23.
Thanks for any suggestions!
Upvotes: 3
Views: 1304
Reputation: 49053
The problem is not due to faceting, it is due to the fact that use two geoms in your plot. So geom_point
will draw your points all in one place, and then geom_jitter
will draw them once more, at random positions. That's why you can see one more point in each plot.
If you remove the call to geom_point
everything is back to normal :
p <- ggplot(test_subset, aes(x = var1, y = value))
p <- p + facet_grid(~variable) + coord_flip()
p <- p + geom_jitter(position = position_jitter(width=0.2, height = 0.2))
p
Upvotes: 2