Ben
Ben

Reputation: 21615

How to connect points of different groups by a line using ggplot

df<-data.frame(adjuster=c("Mary","Mary","Bob","Bob"), date=as.Date(c("2012-1-1","2012-2-1","2012-3-1","2012-4-1")), value=c(10,15,25,15))
df
  adjuster       date value
1     Mary 2012-01-01    10
2     Mary 2012-02-01    15
3      Bob 2012-03-01    25
4      Bob 2012-04-01    15

ggplot(df,aes(x=date,y=value,color=adjuster))+geom_line()+geom_point()

enter image description here

In the above graph, notice the disconnect between the February and March points. How do I connect those points with a blue line, leaving the actual March point red? In other words, Bob should be associated with the value from [Jan - Mar) and Mary from [Mar-Apr].

EDIT: Turns out my example was overly simple. The answers listed don't generalize to the case where the adjuster changes between two people on more than one occasion. For example, consider

df<-data.frame(adjuster=c("Mary","Mary","Bob","Bob","Mary"), date=as.Date(c("2012-1-1","2012-2-1","2012-3-1","2012-4-1","2012-5-1")), value=c(10,15,25,15,20))
      adjuster       date value
1     Mary 2012-01-01    10
2     Mary 2012-02-01    15
3      Bob 2012-03-01    25
4      Bob 2012-04-01    15
5     Mary 2012-05-01    20

Since I didn't mention this in my original question, I'll pick an answer that simply worked for my original data.

Upvotes: 5

Views: 30943

Answers (4)

PatrickT
PatrickT

Reputation: 10510

I'd like to put forward a solution that does not require modifying the dataframe, that is intuitive (once you think about how the layers are drawn), and does not involve lines overwriting one another. It does, however, have one problem: it does not allow you to modify the linetype. I do not know why that is, so if someone could chime in to enlighten us, it would be great.

Quick answer to the OP:

ggplot(df, aes(x = date, y = value, color = adjuster))+
    geom_line(aes(group = 1, colour = adjuster))+
    geom_point(aes(group = adjuster, color = adjuster, shape = adjuster))

In the OP's dataframe, one can use group=1 to create a group spanning the whole period.

An example illustrated with figures:

# Create data
df <- structure(list(year = c(1990, 2000, 2010, 2020, 2030, 2040), 
    variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Something", class = "factor"), 
    value = c(4, 5, 6, 7, 8, 9), category = structure(c(1L, 1L, 1L, 
    2L, 2L, 2L), .Label = c("Observed", "Projected"), class = "factor")), .Names = c("year", 
"variable", "value", "category"), row.names = c(NA, 6L), class = "data.frame")

# Load library
library(ggplot2)

The basic plot, similar to the OP, groups data by category both inside geom_point(aes()) and inside geom_line(aes()), with the undesirable result, in this application, that the line does not 'bridge' the two points across the two categories.

# Basic ggplot with geom_point() and geom_line()
p <- ggplot(data = df, aes(x = year, y = value, group = category)) + 
    geom_point(aes(colour = category, shape = category), size = 4) +
    geom_line(aes(colour = category), size = 1)
ggsave(p, file = "ggplot-points-connect_p1.png", width = 10, height = 10)

enter image description here

The key to my solution is to group by variable but to colour by categoryinside geom_line(aes())

# Modified version to connect the dots "continuously" while preserving color grouping
p <- ggplot(data = df, aes(x = year, y = value)) + 
    geom_point(aes(group = category, colour = category, shape = category), size = 4) +
    geom_line(aes(group = variable, colour = category), size = 1)
ggsave(p, file = "ggplot-points-connect_p2.png", width = 10, height = 10)

enter image description here

However, sadly, with this approach it is not currently possible to control the linetype, as far as I can make out:

ggplot(data = df, aes(x = year, y = value)) + 
    geom_point(aes(group = category, colour = category, shape = category), size = 4) +
    geom_line(aes(group = variable, colour = category), linetype = "dotted", size = 1)
## Error: geom_path: If you are using dotted or dashed lines, colour, size and linetype must be constant over the line

Remark: I'm using another dataframe because I'm copy-pasting from something I was doing and that made me visit this question -- this way I can upload my images.

Upvotes: 3

TheComeOnMan
TheComeOnMan

Reputation: 12875

Updated to minimise tinkering with data.frame, added the group = 1 argument

Tinkered around with your data.frame a little. You should be able to automate the tinkering around, I guess. Let me know if you aren't. Also, your ggplot command wasn't working as per the chart you've posted in the question

df<-data.frame(
  adjuster=c("Mary","Mary","Bob","Bob"), 
  date=as.Date(c("2012-1-1","2012-2-1","2012-3-1","2012-4-1")), 
  value=c(10,15,25,15)
)

library(data.table)
library(ggplot2)
dt <- data.table(df)
dt[,adjuster := as.character(adjuster)]
dt[,prevadjuster := c(NA,head(adjuster,-1))]
dt[is.na(prevadjuster),prevadjuster := adjuster]


ggplot(dt) +
geom_line(aes(x=date,y=value, color = prevadjuster, group = 1)) +
geom_line(aes(x=date,y=value, color = adjuster, group = 1)) +
geom_point(aes(x=date,y=value, color = adjuster, group = 1))

Upvotes: 8

Ben
Ben

Reputation: 21615

I came up with a solution that combines ideas from Codoremifa and JAponte.

df<-data.frame(adjuster=c("Mary","Mary","Bob","Bob"), date=as.Date(c("2012-1-1","2012-2-1","2012-3-1","2012-4-1")), value=c(10,15,25,15))
df$AdjusterLine<-df$adjuster
df[2:nrow(df),]$AdjusterLine<-df[1:(nrow(df)-1),]$adjuster
ggplot(df)+geom_line(aes(x=date,y=value, color=AdjusterLine), lty=2)+geom_line(aes(x=date,y=value, color=adjuster))+geom_point(aes(x=date,y=value, color=adjuster))

enter image description here

Upvotes: 2

JAponte
JAponte

Reputation: 1538

Here is a simple solution. No need to change the original data.frame.

ggplot()+
geom_line(aes_string(x='date',y='value'), data=df, lty=2)+
geom_point(aes_string(x='date',y='value', color='adjuster'), data=df)+
geom_line(aes_string(x='date',y='value', color='adjuster'), data=df)

That's one of my favorite features of ggplot. You can layer your plots one on top of the other pretty cleanly.

Here is the result: enter image description here

Upvotes: 2

Related Questions