Reputation: 167
I'm trying to draw separate lines based on part of a data frame, but when I do, lines are drawn for the points I don't want to connect.
Using geom_segment to manually draw them I get what I'm after (minus correct legend):
But when I use geom_line, which should both allow me to use my code for different graphs and also add the correct legend (I hope), the best I can do is this:
My data frame (df):
df <- read.table(text='Treatment Function Time N Rel_abund sd se ci
1 Start "Methanogenesis" Start 3 1.983614e-04 3.839642e-05 2.216818e-05 9.538199e-05
2 Start "Methane oxidation" Start 3 1.245265e-04 2.275417e-05 1.313712e-05 5.652448e-05
3 Start "Sulphate Reduction" Start 3 3.693332e-05 1.247878e-05 7.204626e-06 3.099900e-05
4 "1 x Flood" "Methanogenesis" End 3 1.673369e-04 1.043482e-05 6.024546e-06 2.592153e-05
5 "1 x Flood" "Methane oxidation" End 3 1.269306e-04 2.938948e-05 1.696803e-05 7.300753e-05
6 "1 x Flood" "Sulphate Reduction" End 3 3.742168e-05 2.187629e-06 1.263028e-06 5.434372e-06
7 "3 x Floods" "Methanogenesis" End 3 2.135845e-04 3.762486e-05 2.172272e-05 9.346534e-05
8 "3 x Floods" "Methane oxidation" End 3 9.097189e-05 1.192464e-05 6.884691e-06 2.962244e-05
9 "3 x Floods" "Sulphate Reduction" End 3 8.513220e-05 2.271764e-05 1.311603e-05 5.643374e-05')
And my code:
ggplot(df, aes(x=Time, y=Rel_abund,col=Function))+geom_point(size=2,position=position_dodge(.1))+
geom_errorbar(aes(ymin=Rel_abund-se,ymax=Rel_abund+se),width=0.075,position=position_dodge(.1))+
geom_line(aes(group=Function),position=position_dodge(.1))+
I can see what it's doing, in that it's connecting all occurrences of each Function, but when I create two new columns with NA in the appropriate rows, it still connects all of the NA occurrences.
Ultimately, I'd like a graph that looks similar to my top one, with a legend that includes the difference line types for the different treatments (1 x Flood, 3 x Floods), and that uses code that can be easily applied or modified for other datasets.
Thanks in advance!
Upvotes: 1
Views: 812
Reputation: 59425
Here is an approach that uses geom_segment(...)
and does not require reshaping the dataset, although I must admit it's a bit of a hack.
df$Start <- merge(df,df[df$Time=="Start",c(3,5)],by="Time")$Rel_abund.y
df$Time <- factor(df$Time,levels=c("Start","End"))
library(ggplot2)
ggplot(df, aes(x=Time, y=Rel_abund,col=Function))+
geom_point(size=2,position=position_dodge(.1))+
geom_errorbar(aes(ymin=Rel_abund-se,ymax=Rel_abund+se),width=0.075,position=position_dodge(.1))+
geom_segment(data=df[df$Time!="Start",],
aes(x=1, xend=2+(as.numeric(Function)-2)*0.04, y=Start, yend=Rel_abund, color=Function , linetype=Treatment),
position=position_dodge(0.1))
So here we add an extra column, Start
, to df
which contains the starting value of Rel_abund
- in other words we replicate df[df$Time=="Start",]$Rel_abund
for each value of Function
. Using merge(...)
to do this just guarantees that the values are assigned proerly for different values of Function
. From there it's straightforward to used geom_segment(...)
, except for one thing: you want to dodge the x-values.
The problem with dodging geom_segment(...)
is that position_dodge(...)
is only applied to the x
aesthetic, not xend
. So the code above hacks that by adding 0.04*(as.numeric(Function)-2)
to xend
.
Upvotes: 1
Reputation: 206606
Unfortunately I don't know of anyway to do this other than reshaping your data for the lines. Here's the transformation I would do
ld<-do.call(rbind, lapply(split(df, df$Function), function(x) {
s <- x$Time=="Start";
ids <- paste(x$Function[!s], 1:sum(!s))
cols <- c("Time","Rel_abund", "Function")
suppressWarnings(rbind(
cbind(x[s, cols], Treatment=x$Treatment[!s], id=ids),
cbind(x[!s, c(cols, "Treatment")], id=ids)
))
}))
Which will produce
Time Rel_abund Function Treatment id
1 Start 1.245265e-04 Methane oxidation 1 x Flood Methane oxidation 1
2 Start 1.245265e-04 Methane oxidation 3 x Floods Methane oxidation 2
3 End 1.269306e-04 Methane oxidation 1 x Flood Methane oxidation 1
4 End 9.097189e-05 Methane oxidation 3 x Floods Methane oxidation 2
5 Start 1.983614e-04 Methanogenesis 1 x Flood Methanogenesis 1
6 Start 1.983614e-04 Methanogenesis 3 x Floods Methanogenesis 2
7 End 1.673369e-04 Methanogenesis 1 x Flood Methanogenesis 1
8 End 2.135845e-04 Methanogenesis 3 x Floods Methanogenesis 2
9 Start 3.693332e-05 Sulphate Reduction 1 x Flood Sulphate Reduction 1
10 Start 3.693332e-05 Sulphate Reduction 3 x Floods Sulphate Reduction 2
11 End 3.742168e-05 Sulphate Reduction 1 x Flood Sulphate Reduction 1
12 End 8.513220e-05 Sulphate Reduction 3 x Floods Sulphate Reduction 2
So we've replicated the Start value for each Function/Treatment combination. Then we kept both End values. We also update the Treatment for the Start values so we can style based on that. Finally we added an ID so qqplot will know which start/end points to connect. It may not be the prettiest transformation but it gets the job done.
Now we can draw the plot with
library(ggplot2)
ggplot(df, aes(x=Time, y=Rel_abund, col=Function))+
geom_point(size=2,position=position_dodge(.1))+
geom_errorbar(aes(ymin=Rel_abund-se, ymax=Rel_abund+se),
width=0.075, position=position_dodge(.1))+
geom_line(data=ld, aes(group=id, lty=Treatment),
position=position_dodge(.1)
)
Notice that we use the original data.frame for the first new commands, and specify our special dataset just for the geom_line()
. And that will give us
which is pretty close to your first picture.
Upvotes: 1