Nadiah
Nadiah

Reputation: 175

ggplot2; single regression line when colour is coded for by a variable?

I am trying to create a scatterplot in ggplot2 with one regression line even though colour is dependent on the 'Survey Type' variable. I would ideally also like to specify which survey type is which colour (community = red, subnational = green, national = blue).

This is the code I'm running which currently gives me 3 separate regression lines, one for each survey type.

ggplot(data=data.male,aes(x=mid_year, y=mean_tc, colour =condition)) +
geom_point(shape=1) + 
geom_smooth(method=lm, data=data.male, na.rm = TRUE, fullrange= TRUE) 

The condition is:

condition <- (data.male$survey_type)

Even if I move the colour aesthetic to the geom_point function it doesn't work as it gives me an error saying community is not a valid colour name?

My actual data file is really big so I'll just give a small sample here:

data.male dataset:

mid_year mean_tc survey_type
2000     4       Community
2001     5       National
2002     5.1     Subnational
2003     4.3     National
2004     4.5     Community
2005     5.2     Subnational
2006     4.4     National

Upvotes: 10

Views: 6420

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226182

data.male <- read.table(header=TRUE,text="
 mid_year mean_tc survey_type
 2000     4       Community
 2001     5       National
 2002     5.1     Subnational
 2003     4.3     National
 2004     4.5     Community
 2005     5.2     Subnational
 2006     4.4     National")
  • Use aes(group=1) in the geom_smooth() specification to ignore the grouping by survey type induced by assigning the colour mapping to survey type. (Alternatively, you can put the colour mapping into geom_point() rather than the overall ggplot() specification.)
  • If you want to specify colour you need to give it as the name of a variable in your data frame (i.e., survey_type); if you want to change the name in the legend to condition you can do that in the colour scale specification (example below).
library(ggplot2); theme_set(theme_bw())
ggplot(data=data.male,aes(x=mid_year, y=mean_tc, colour=survey_type)) +
   geom_point(shape=1) +
   ## use aes(group=1) for single regression line across groups;
   ##   don't need to re-specify data argument
   ##  set colour to black (from default blue) to avoid confusion
   ##  with national (blue) points
   geom_smooth(method=lm, na.rm = TRUE, fullrange= TRUE,
               aes(group=1),colour="black")+
   scale_colour_manual(name="condition",
       values=c("red","blue","green"))
       ## in factor level order; probably better to
       ## specify 'breaks' explicitly ...
  • Out of courtesy to colour-blind people I would suggest not using primary red/green/blue as your colour specifications (try scale_colour_brewer(palette="Dark1") instead).

enter image description here

Upvotes: 9

Related Questions