Reputation: 25
I have the following dataset which contains three variables, species, CO2 and stomatal density. I'm looking to create a plot which has a line for each species to show how stomatal density changes over C02 for each species. I have used the following code.
# Libraries
library(ggplot2)
library(dplyr)
# Separate into Species
don <- mydata %>%
filter(Species %in% c("Alnus glutinosa", "Betula pendula", "Betula pubescens", "Corylus avellana", "Quercus petrea", "Sorbus aucuparia"))
# Plot
don %>%
ggplot( aes(x=CO2, y=Stomatal_Density, group=Species, color=Species)) +
geom_line()
The code works, however, the lines for each plot don't look great as can be seen.
I tried replacing geom_line()
with geom_smooth()
however no lines appear on the graph. Is there a way that I could make these lines look better?
Update: Here's my data
dput(mydata)
structure(list(Species = c("Alnus glutinosa", "Alnus glutinosa",
"Alnus glutinosa", "Alnus glutinosa", "Alnus glutinosa", "Alnus glutinosa",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pendula",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pendula",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pendula",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pendula",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pendula",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pendula",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pendula",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pendula",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pendula",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pendula",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pendula",
"Betula pendula", "Betula pendula", "Betula pendula", "Betula pubescens",
"Betula pubescens", "Betula pubescens", "Betula pubescens", "Betula pubescens",
"Betula pubescens", "Betula pubescens", "Betula pubescens", "Betula pubescens",
"Betula pubescens", "Betula pubescens", "Corylus avellana", "Corylus avellana",
"Corylus avellana", "Corylus avellana", "Corylus avellana", "Corylus avellana",
"Corylus avellana", "Corylus avellana", "Corylus avellana", "Corylus avellana",
"Corylus avellana", "Corylus avellana", "Corylus avellana", "Corylus avellana",
"Corylus avellana", "Corylus avellana", "Corylus avellana", "Corylus avellana",
"Corylus avellana", "Corylus avellana", "Corylus avellana", "Corylus avellana","Quercus petrea", "Quercus petrea", "Quercus petrea", "Quercus petrea",
"Quercus petrea", "Quercus petrea", "Quercus petrea", "Quercus petrea",
"Quercus petrea", "Quercus petrea", "Quercus petrea", "Quercus petrea",
"Quercus petrea", "Quercus petrea", "Quercus petrea", "Sorbus aucuparia",
"Sorbus aucuparia", "Sorbus aucuparia", "Sorbus aucuparia", "Sorbus aucuparia",
"Sorbus aucuparia", "Sorbus aucuparia", "Sorbus aucuparia"),
CO2 = c(356.45, 371.14, 371.14, 371.14, 375.8, 391.65, 358.87,
358.87, 358.87, 358.87, 358.87, 358.87, 358.87, 358.87, 358.87,
361.2, 361.2, 361.2, 361.2, 361.2, 363.55, 363.55, 363.55,
363.55, 337.86, 373.47, 373.47, 373.47, 373.47, 387.63, 389.63,
392.27, 392.27, 392.27, 392.27, 392.27, 392.27, 392.27, 392.27,
392.27, 392.27, 392.27, 392.27, 392.27, 392.27, 392.27, 392.27,
392.27, 392.27, 392.27, 393.83, NA, NA, 354.39, 356.46, 356.46,
358.87, 361.2, 353.83, 387.63, 389.63, 393.83, 409.39, 411.18,
356, 371.14, 371.14, 371.14, 375.8, 389.63, 389.63, 389.63,
389.63, 389.63, 389.63, 389.63, 389.63, 389.63, 389.63, 389.63,
389.63, 389.63, 389.63, 389.63, 389.63, 389.63, 291, 300.6,
300.6, 356.46, 356.46, 356.46, 363.55, 363.55, 366.75, 370.19,
370.19, 406.62, 409.39, 409.39, 409.39, 305, 356, 356, 362.61,
371.14, 371.14, 371.14, 377.52), SD = c(108, 218.75, 218.75,
92.01388889, 107.85, NA, 60, 108.1, 135.6, 128.4, 115.1,
202.6, 102.4, 65.9, 39.3, 45, 79.5, 105.2, 93.9, 75.3, 79.3,
62, 93.9, 81.4, 101, 66.8, 132.81, 132.81, 92.45, 174.6,
160, 243.68, 187.98, 229.76, 222.76, 208.87, 160.13, 194.95,
215.83, 222.79, 201.91, 208.87, 187.98, 250.64, 181.02, 292.42,
264.57, 257.61, 264.57, 243.68, 14, 127, 143, NA, 147, 61,
87.8, 65, 124.5, 111.1, 107, 12.6, 2.99, 2.99, 225, 164.9305556,
164.9305556, 101.5625, 84, 113.64, 95.25, 98.32, 94.38, 107.34, 83.08, 96.45, 91.48, 92.11, 90.8, 99.6, 91.45, 117.73, 83.33,
96.28, 88.26, 110.58, 698, 810, 468, 510, 370, 405, 47.5,
19.6, 4.6, 394.3, 355.1, 333, 215.14, 168.06, 175.33, 118,
224, 132, 132, 157.1180556, 157.1180556, 99.39236111, 73.9
)), class = "data.frame", row.names = c(NA, -109L))
Upvotes: 2
Views: 85
Reputation: 226047
It's hard for R to make very smooth lines since some of your groups have very few unique points:
don %>% group_by(Species) %>% summarise(n=length(unique(CO2)))
# A tibble: 6 x 2
Species n
<chr> <int>
1 Alnus glutinosa 4
2 Betula pendula 10
3 Betula pubescens 10
4 Corylus avellana 4
5 Quercus petrea 8
6 Sorbus aucuparia 5
Unfortunately I don't know of a super-easy way to create different kinds of flexible smooths (i.e. lines that are smooth but not necessarily as simple as straight lines) for groups with different numbers of points (some large enough for geom_smooth
, some not), or to make geom_smooth()
robust so that it skips groups where smoothing fails (rather than just failing). You could add linear regression lines, which will work as long as there are at least two x-values per group:
ggplot(don, aes(x=CO2, y=SD, group=Species, color=Species)) +
geom_point() + geom_smooth(method="lm")
You can make things a little bit better by drawing a line through the mean values for each unique CO2 level:
ggplot(don, aes(x=CO2, y=SD, group=Species, color=Species)) +
geom_point() + stat_summary(fun=mean, geom="line")
but there is still a big dip for Quercus petrea (what's going on with those data anyway?)
(You could use geom_smooth(method="lm",formula=y~poly(x,2))
, which would give you quadratic fits ... this would be a little more flexible than assuming a straight line ...)
Upvotes: 2
Reputation: 432
The best I think of is to put a line through the points, and maybe change the individual points to scatter instead. The line doesn't appear to be a fantastic approximation for all of the groups, but perhaps it's a start.
don %>%
ggplot( aes(x=CO2, y=SD, group=Species, color=Species)) +
geom_point() +
geom_smooth(method = "nls", formula = y ~ a * x + b, se = F,
method.args = list(start = list(a = 0.1, b = 0.1)))
Upvotes: 2