suknat
suknat

Reputation: 335

ggplot with scale_shape_manual when shapes < colours

I have data in a CSV file like this:

Year,A,B,C,D,E,F,G
2007,3.35,,,,,,
2008,3.54,3.59,,,,,
2009,3.22,3.46,4.43,,,,
2010,3.82,3.63,4.64,,,,
2011,2.91,3.74,4.5,4.13,4.38,,
2012,3.85,3.57,4.13,4,4,4,
2013,4.33,2.93,4.63,4.71,4.25,,
2014,4.73,4,4.81,4.66,4.33,,4
2015,,,4.89,4.68,,,

I'm trying to plot it like this:

scores_raw = read.csv("scores.csv", header = TRUE, fill = TRUE)

scores_melt <- melt(scores_raw, id = "Year")

scores_symb <- c(15, 17, 16, 16, 16, 16, 16)  

plot_scores <- ggplot(scores_melt, aes(x=Year, y=value, colour=variable, shape=variable))
plot_scores +
  geom_line() + 
  geom_point(size = 10, alpha = 0.6) + 
  scale_shape_manual(values = scores_symb, 
                     name="Cohort\nSize",
                     labels=c("200", "100", "25")) +
  ylab("Score (5 = max)") + 
  scale_y_continuous(limits = c(0, 5)) +
  theme_bw() +
  theme(
    text = element_text(size=30)
    , axis.title.y=element_text(vjust=1.5)
    , axis.title.x=element_text(vjust=0.1)
    , plot.background = element_rect(fill = "transparent",colour = NA)
    , legend.justification=c(0,0), legend.position=c(0,0) #legend.position="none"
    , legend.background = element_rect(fill="transparent", size=.5, linetype="dotted")
  )

As you can tell, I've got 7 series but only want to tell them apart by 3 cohort sizes (i.e. shape).

I would like the legend to only show the three shapes that discriminate the three types of data I've got. At the moment, I can either produce a single legend with shapes and colours combined. Or two legends (as in the code above) that produces two legends, one with shapes (4 of which are NAs) and the other with the colours.

Help please!

Upvotes: 0

Views: 2101

Answers (1)

Axeman
Axeman

Reputation: 35392

You are trying to map cohort size to shape. Instead of hacking the scales, actually map cohort size to shape (and not variable). You can do this by creating a new variable called cohort_size.

Read in data

scores_raw <- read.table(text = "Year,A,B,C,D,E,F,G
2007,3.35,,,,,,
2008,3.54,3.59,,,,,
2009,3.22,3.46,4.43,,,,
2010,3.82,3.63,4.64,,,,
2011,2.91,3.74,4.5,4.13,4.38,,
2012,3.85,3.57,4.13,4,4,4,
2013,4.33,2.93,4.63,4.71,4.25,,
2014,4.73,4,4.81,4.66,4.33,,4
2015,,,4.89,4.68,,,", sep = ",", header = TRUE)

Melt and create the new variable

scores_symb <- c(15, 17, 16, 16, 16, 16, 16)
scores_melt <- reshape2::melt(scores_raw, id = "Year")
# Add the new variable
scores_melt$cohort_size <- scores_melt$variable
# Correctly map the levels
levels(scores_melt$cohort_size) <- scores_symb
# Reorder the levels (you might want to use lables = c(25, 100, 200) here)
scores_melt$cohort_size <- factor(scores_melt$cohort_size, levels = 15:17)

Create the plot

# Simplified ggplot call that looks ok on my screen
ggplot(scores_melt, aes(x = Year, y = value, colour = variable, shape = cohort_size)) +
  geom_line() +
  geom_point(size = 5, alpha = 0.6) +
  ylab("Score (5 = max)") +
  scale_colour_discrete(guide = FALSE) +
  theme_bw() +
  theme(legend.position=c(0.1, 0.8))

Result

enter image description here

Upvotes: 1

Related Questions