mandy
mandy

Reputation: 503

R graph: label by group

The data I am working on is a clustering data, with multiple observations within one group, I generated a caterpillar plot and want labelling for each group(zipid), not every line, my current graph and code look like this:

  text = hosp_new[,c("zipid")]
  ggplot(hosp_new, aes(x = id, y = oe, colour = zipid, shape = group)) +
  # theme(panel.grid.major = element_blank()) +
  geom_point(size=1) +
  scale_shape_manual(values = c(1, 2, 4)) +
  geom_errorbar(aes(ymin = low_ci, ymax = high_ci)) +
  geom_smooth(method = lm, se = FALSE) +
  scale_linetype_manual(values = linetype) +
  geom_segment(aes(x = start_id, xend = end_id, y = region_oe, yend = region_oe, linetype = "4", size = 1.2)) +
  geom_ribbon(aes(ymin = region_low_ci, ymax = region_high_ci), alpha=0.2, linetype = "blank") +
  geom_hline(aes(yintercept = 1, alpha = 0.2, colour = "red", size = 1), show.legend = "FALSE") +
  scale_size_identity() +
  scale_x_continuous(name = "hospital id", breaks = seq(0,210, by = 10)) +
  scale_y_continuous(name = "O:E ratio", breaks = seq(0,7, by = 1)) +
  geom_text(aes(label = text), position = position_stack(vjust = 10.0), size = 2)

Caterpillar plot:

caterpillar plot

Each color represents a region, I just want one label/per region, but don't know how to delete the duplicated labels in this graph. Any idea?

Upvotes: 2

Views: 3569

Answers (1)

eipi10
eipi10

Reputation: 93761

The key is to have geom_text return only one value for each zipid, rather than multiple values. If we want each zipid label located in the middle of its group, then we can use the average value of id as the x-coordinate for each label. In the code below, we use stat_summaryh (from the ggstance package) to calculate that average id value for the x-coordinate of the label and return a single label for each zipid.

library(ggplot2)
theme_set(theme_bw())
library(ggstance)

# Fake data
set.seed(300)
dat = data.frame(id=1:100, y=cumsum(rnorm(100)), 
                 zipid=rep(LETTERS[1:10], c(10, 5, 20, 8, 7, 12, 7, 10, 13,8)))

ggplot(dat, aes(id, y, colour=zipid)) +
  geom_segment(aes(xend=id, yend=0)) +
  stat_summaryh(fun.x=mean, aes(label=zipid, y=1.02*max(y)), geom="text") +
  guides(colour=FALSE)

enter image description here

You could also use faceting, as mentioned by @user20650. In the code below, panel.spacing.x=unit(0,'pt') removes the space between facet panels, while expand=c(0,0.5) adds 0.5 units of padding on the sides of each panel. Together, these ensure constant spacing between tick marks, even across facets.

ggplot(dat, aes(id, y, colour=zipid)) +
  geom_segment(aes(xend=id, yend=0)) +
  facet_grid(. ~ zipid, scales="free_x", space="free_x") +
  guides(colour=FALSE) +
  theme_classic() +
  scale_x_continuous(breaks=0:nrow(dat), 
                     labels=c(rbind(seq(0,100,5),'','','',''))[1:(nrow(dat)+1)], 
                     expand=c(0,0.5)) +
  theme(panel.spacing.x = unit(0,"pt")) 

enter image description here

Upvotes: 5

Related Questions