RB88
RB88

Reputation: 167

Use geom_line to draw lines for subsets of a factor

I'm trying to draw separate lines based on part of a data frame, but when I do, lines are drawn for the points I don't want to connect.

Using geom_segment to manually draw them I get what I'm after (minus correct legend):

enter image description here

But when I use geom_line, which should both allow me to use my code for different graphs and also add the correct legend (I hope), the best I can do is this:

enter image description here

My data frame (df):

df <- read.table(text='Treatment           Function  Time N    Rel_abund           sd           se           ci
1      Start     "Methanogenesis" Start 3 1.983614e-04 3.839642e-05 2.216818e-05 9.538199e-05
2      Start  "Methane oxidation" Start 3 1.245265e-04 2.275417e-05 1.313712e-05 5.652448e-05
3      Start "Sulphate Reduction" Start 3 3.693332e-05 1.247878e-05 7.204626e-06 3.099900e-05
4  "1 x Flood"     "Methanogenesis"   End 3 1.673369e-04 1.043482e-05 6.024546e-06 2.592153e-05
5  "1 x Flood"  "Methane oxidation"   End 3 1.269306e-04 2.938948e-05 1.696803e-05 7.300753e-05
6  "1 x Flood" "Sulphate Reduction"   End 3 3.742168e-05 2.187629e-06 1.263028e-06 5.434372e-06
7 "3 x Floods"     "Methanogenesis"   End 3 2.135845e-04 3.762486e-05 2.172272e-05 9.346534e-05
8 "3 x Floods"  "Methane oxidation"   End 3 9.097189e-05 1.192464e-05 6.884691e-06 2.962244e-05
9 "3 x Floods" "Sulphate Reduction"   End 3 8.513220e-05 2.271764e-05 1.311603e-05 5.643374e-05')

And my code:

ggplot(df, aes(x=Time, y=Rel_abund,col=Function))+geom_point(size=2,position=position_dodge(.1))+
  geom_errorbar(aes(ymin=Rel_abund-se,ymax=Rel_abund+se),width=0.075,position=position_dodge(.1))+
  geom_line(aes(group=Function),position=position_dodge(.1))+

I can see what it's doing, in that it's connecting all occurrences of each Function, but when I create two new columns with NA in the appropriate rows, it still connects all of the NA occurrences.

Ultimately, I'd like a graph that looks similar to my top one, with a legend that includes the difference line types for the different treatments (1 x Flood, 3 x Floods), and that uses code that can be easily applied or modified for other datasets.

Thanks in advance!

Upvotes: 1

Views: 812

Answers (2)

jlhoward
jlhoward

Reputation: 59425

Here is an approach that uses geom_segment(...) and does not require reshaping the dataset, although I must admit it's a bit of a hack.

df$Start <- merge(df,df[df$Time=="Start",c(3,5)],by="Time")$Rel_abund.y
df$Time  <- factor(df$Time,levels=c("Start","End"))

library(ggplot2)
ggplot(df, aes(x=Time, y=Rel_abund,col=Function))+
  geom_point(size=2,position=position_dodge(.1))+
  geom_errorbar(aes(ymin=Rel_abund-se,ymax=Rel_abund+se),width=0.075,position=position_dodge(.1))+
  geom_segment(data=df[df$Time!="Start",], 
               aes(x=1, xend=2+(as.numeric(Function)-2)*0.04, y=Start, yend=Rel_abund, color=Function , linetype=Treatment),
               position=position_dodge(0.1))

So here we add an extra column, Start, to df which contains the starting value of Rel_abund - in other words we replicate df[df$Time=="Start",]$Rel_abund for each value of Function. Using merge(...) to do this just guarantees that the values are assigned proerly for different values of Function. From there it's straightforward to used geom_segment(...), except for one thing: you want to dodge the x-values.

The problem with dodging geom_segment(...) is that position_dodge(...) is only applied to the x aesthetic, not xend. So the code above hacks that by adding 0.04*(as.numeric(Function)-2) to xend.

Upvotes: 1

MrFlick
MrFlick

Reputation: 206606

Unfortunately I don't know of anyway to do this other than reshaping your data for the lines. Here's the transformation I would do

ld<-do.call(rbind, lapply(split(df, df$Function), function(x) {
    s <- x$Time=="Start"; 
    ids <- paste(x$Function[!s], 1:sum(!s)) 
    cols <- c("Time","Rel_abund", "Function")
    suppressWarnings(rbind(
        cbind(x[s, cols], Treatment=x$Treatment[!s], id=ids), 
        cbind(x[!s, c(cols, "Treatment")], id=ids)
    ))
}))

Which will produce

    Time    Rel_abund           Function  Treatment                   id
1  Start 1.245265e-04  Methane oxidation  1 x Flood  Methane oxidation 1
2  Start 1.245265e-04  Methane oxidation 3 x Floods  Methane oxidation 2
3    End 1.269306e-04  Methane oxidation  1 x Flood  Methane oxidation 1
4    End 9.097189e-05  Methane oxidation 3 x Floods  Methane oxidation 2
5  Start 1.983614e-04     Methanogenesis  1 x Flood     Methanogenesis 1
6  Start 1.983614e-04     Methanogenesis 3 x Floods     Methanogenesis 2
7    End 1.673369e-04     Methanogenesis  1 x Flood     Methanogenesis 1
8    End 2.135845e-04     Methanogenesis 3 x Floods     Methanogenesis 2
9  Start 3.693332e-05 Sulphate Reduction  1 x Flood Sulphate Reduction 1
10 Start 3.693332e-05 Sulphate Reduction 3 x Floods Sulphate Reduction 2
11   End 3.742168e-05 Sulphate Reduction  1 x Flood Sulphate Reduction 1
12   End 8.513220e-05 Sulphate Reduction 3 x Floods Sulphate Reduction 2

So we've replicated the Start value for each Function/Treatment combination. Then we kept both End values. We also update the Treatment for the Start values so we can style based on that. Finally we added an ID so qqplot will know which start/end points to connect. It may not be the prettiest transformation but it gets the job done.

Now we can draw the plot with

library(ggplot2)
ggplot(df, aes(x=Time, y=Rel_abund, col=Function))+
    geom_point(size=2,position=position_dodge(.1))+
    geom_errorbar(aes(ymin=Rel_abund-se, ymax=Rel_abund+se), 
        width=0.075, position=position_dodge(.1))+
    geom_line(data=ld, aes(group=id, lty=Treatment),
        position=position_dodge(.1)
)

Notice that we use the original data.frame for the first new commands, and specify our special dataset just for the geom_line(). And that will give us

enter image description here

which is pretty close to your first picture.

Upvotes: 1

Related Questions