stuwest
stuwest

Reputation: 928

Using ggplot to plot overall dataset and subsets

I have a dataset organised into subcategories and sub-subcategories, along the lines of nested bullet points:

-1
 -1a
  -1ai
  -1aii
 -1b
  -1bi

...and so on.

I want to use ggplot2 to make a dotplot which shows all data for 1 followed by data for 1a only, followed by data for 1ai only, and so on.

Example dataset:

x <- data.frame(cat=1, subA=letters[rep(1:5,each=10)], 
subB=as.character(as.roman(rep(1:5,5,each=2))),value=rnorm(50,20,7))

> head(x)
  cat subA subB    value
1   1    a    I 26.75573
2   1    a    I 12.52218
3   1    a   II 24.53499
4   1    a   II 23.21012
5   1    a  III 11.18173
6   1    a  III 25.01914

I want to end up with a chart that looks something like this:

subset and overall data dotplot

I was able to make this plot by doing lots of subsetting and rbinding to make a massively redundant derivative data frame, but this seems like a clear example of Doing It Wrong.

x2 <- with(x,rbind(cbind(key="1",x), 
cbind(key="1 a",x[paste(cat,subA) == "1 a",]), 
cbind(key="1 a I",x[paste(cat,subA,subB) == "1 a I",]), 
cbind(key="1 a II",x[paste(cat,subA,subB) == "1 a II",])))

library(ggplot2)
library(plyr)
ggplot(x2,aes(x=reorder(key,desc(key)),y=value)) 
+ geom_point(position=position_jitter(width=0.1,height=0)) 
+ coord_flip() + scale_x_discrete("Category")

Is there a better way of doing this? A related problem is that it would be nice if each value always had the same amount of jitter added to it, whether it was plotted against "1" or "1 a" or "1 a II", but there I'm not even sure where to start.

Upvotes: 2

Views: 772

Answers (1)

Arun
Arun

Reputation: 118799

I can't think of a way other than reconstructing your data with separate groups as shown below:

x.m1 <- x[c("cat", "value")]
x.m2 <- do.call(rbind, lapply(split(x, interaction(x[, 1:2])), function(y) {
    y$cat <- do.call(paste0, y[, 1:2])
    y[c("cat", "value")]
}))
x.m3 <- do.call(rbind, lapply(split(x, interaction(x[, 1:3])), function(y) {
    y$cat <- do.call(paste0, y[, 1:3])
    y[c("cat", "value")]
}))

y <- rbind(x.m1, x.m2, x.m3)

ggplot(data = y, aes(x = value, y = cat)) + geom_point()

ggplot2_multiple_levels

Note: You should reorder the levels of cat column in y to order the y-axis in the way you want. I'll leave that to you.

Edit: Following @Justin's suggestion, you could do something like this:

x.m1 <- x
x.m1$grp <- x$cat
x.m2 <- do.call(rbind, lapply(split(x, interaction(x[, 1:2])), function(y) {
    y$grp <- do.call(paste0, y[, 1:2])
    y
}))
x.m3 <- do.call(rbind, lapply(split(x, interaction(x[, 1:3])), function(y) {
    y$grp <- do.call(paste0, y[, 1:3])
    y
}))

y <- rbind(x.m1, x.m2, x.m3)

ggplot(data = y, aes(x = value, y = grp)) + geom_point(aes(colour=subA, shape=subB))

ggplot2_multiple_levels_color_shape

Upvotes: 2

Related Questions