Reputation: 4873
that's my df (almost 100,000 rows and 10 ID values)
Date.time P ID
1 2013-07-03 12:10:00 1114.3 J9335
2 2013-07-03 12:20:00 1114.5 K0904
3 2013-07-03 12:30:00 1114.3 K0904
4 2013-07-03 12:40:00 1114.1 K1136
5 2013-07-03 12:50:00 1114.1 K1148
............
With ggplot I create this graph:
ggplot(df) + geom_line(aes(Date.time, P, group=ID, colour=ID)
No problem with this graph. But at the moment that I have to print it also in b/w, the separation in colors is not a smart choice.
I try to group the ID with the line type but the result is not so exiting.
So my idea is to add a different symbol at the beginning and at the end of every line: so the different IDs can be identified also in a b/w paper.
I add the lines:
geom_point(data=df, aes(x=min(Date.time), y=P, shape=ID))+
geom_point(data=df, aes(x=max(Date.time), y=P, shape=ID))
But an error occur.. Any suggestions?
Given that every line is composed by around 5000 or 10000 values it's impossible to plot the values as different characters. A solution could be to plot the lines and then plot the point as different symbol for every ID divided into breaks (for example one character every 500 values). Is it possible to do that?
Upvotes: 0
Views: 569
Reputation: 67778
What about adding the geom_point
s using a subset
of you data with only the min-max time values?
# some data
df <- data.frame(
ID = rep(c("a", "b"), each = 4),
Date.time = rep(seq(Sys.time(), by = "hour", length.out = 4), 2),
P = sample(1:10, 8))
df
# create a subset with min and max time values
# if min(x) and max(x) is the same for each ID:
df_minmax <- subset(x= df, subset = Date.time == min(Date.time) | Date.time == max(Date.time))
# if min(x) and max(x) may differ between ID,
# calculate min and max values *per* ID
# Here I use ddply, but several other aggregating functions in base R will do as well.
library(plyr)
df_minmax <- ddply(.data = df, .variables = .(ID), subset,
Date.time == min(Date.time) | Date.time == max(Date.time))
gg <- ggplot(data = df, aes(x = Date.time, y = P)) +
geom_line(aes(group = ID, colour = ID)) +
geom_point(data = df_minmax, aes(shape = ID))
gg
If you wish to have some control over your shape
s, you may have a look at ?scale_shape_discrete
(with examples here).
Edit following updated question
For each ID, add a shape to the line at some interval.
# create a slightly larger data set
df <- data.frame(
ID = rep(c("a", "b"), each = 100),
Date.time = rep(seq(Sys.time(), by = "day", length.out = 100), 2),
P = c(sample(1:10, 100, replace = TRUE), sample(11:20, 100, replace = TRUE)))
# for each ID:
# create a time sequence from min(time) to max(time), by some time step
# e.g. a week
df_gap <- ddply(.data = df, .variables = .(ID), summarize,
Date.time =
seq(from = min(Date.time), to = max(Date.time), by = "week"))
# add P from df to df_gap
df_gap <- merge(x = df_gap, y = df)
gg <- ggplot(data = df, aes(x = Date.time, y = P)) +
geom_line(aes(group = ID, colour = ID)) +
geom_point(data = df_gap, aes(shape = ID)) +
# if your gaps are not a multiple of the length of the data
# you may wish to add the max points as well
geom_point(data = df_minmax, aes(shape = ID))
gg
Upvotes: 3
Reputation: 300
The error stems from the fact that the single numeric value min(Date.time) doesn't match up in length with the vectors P or ID. Another problem might be that you're re-declaring your data variable even though you already have ggplot(df).
The solution that immediately comes to mind is to figure out what the row indexes are for your minimum and maximum dates. If they all share the same minimum and maximum time stamps than its easy. Use the which() function to come up with an array of the row numbers you'll need.
min.index <- which(df$Date.time == min(df$Date.time))
max.index <- which(df$Date.time == max(df$Date.time))
Then use those arrays as your indexes.
geom_point(aes(x=Date.time[min.index], y=P[min.index], shape=ID[min.index]))+
geom_point(aes(x=Date.time[max.index], y=P[max.index], shape=ID[max.index]))
Upvotes: 1