user1807857
user1807857

Reputation: 409

ggplot colored lines according to group, how to not connect between missing values

I am plotting a time series of wind speed and would like to color the lines according to season. There are a couple of missing data across the data set, with one gap being a couple of months. When I plot the ggplot with the coloring according to the season, unfortunately, it plots a connecting line from the end of the season (e.g. winter) to the next time this season (e.g winter) appears. How can I stop it from doing that?

here is an excerpt from my data:

     date                     wspd_havg10m_kn   avg_wd  season
1   2013-12-06 00:25:00     9.8358531   50  Winter
2   2013-12-06 01:25:00     10.5064795  56  Winter
3   2013-12-06 02:25:00     11.8477322  55  Winter
4   2013-12-06 03:25:00     NA              53  NA
5   2013-12-06 04:25:00     13.1889849  47  Winter
6   2013-12-06 05:25:00     13.1889849  60  Winter
7   2013-12-06 06:25:00     NA              51  NA
8   2013-12-06 07:25:00     9.6123110   50  Winter
9   2013-12-06 08:25:00     7.6004320   53  Winter
10  2013-12-06 09:25:00     9.6123110   52  Winter
11  2013-12-06 10:25:00     8.2710583   66  Winter


# add column that specifies the season
mydata$season<-time2season(mydata$date, out.fmt="seasons", type="default")

#capitalize season categories
mydata$season<-capitalize(mydata$season)


g<-ggplot(mydata, aes(date, wspd_havg10m_kn, color=season))+
  geom_line(size=0.1) +
  geom_smooth(colour = "black",size = 1, method = "gam", formula = y ~ s(x), bs = "cs") +  
  scale_y_continuous(limits = c(0,45), breaks = seq(0,45,5))+ 
  scale_color_discrete(name="Season", breaks=c("Spring","Summer","Autumm", "Winter"))+
  xlab("\nSampling Period (mm/yy)\n") +  
  ylab("Hourly Wind Speed Sample (kt)\n")

# adjust the way labels and ticks are set on the x axis:
g+ scale_x_datetime(breaks = date_breaks ("2 months"), labels= date_format ("%m/%y"), limits=c(start_date, end_date))

I tried setting the season to NA when I was missing wind speed but that didn't do anything. I am still left with connecting lines between the last season to the next season...

any ideas? cheers sandra

Upvotes: 4

Views: 1840

Answers (1)

Mike Wise
Mike Wise

Reputation: 22827

I don't see it as an exact duplicate because of the colors and the NAs. I think you are looking for something like this:

# Read the data
library(lubridate)
df <- read.csv("data.csv",
               strip.white=T,
               colClasses=c("character","numeric","numeric","factor"))

df$date <- ymd_hms(df$date,tz="UCT")

#define our group variable and plot it
df$grp <- cumsum(is.na(df$wind))
ggplot(data=df[complete.cases(df),],aes(date,wind,color=season)) + 
     geom_line(aes(group=grp)) +
     scale_color_manual(values=c("Fall"="brown","Winter"="darkblue"))   

Here is the data I used

date,wind,temp,season
2013-12-20 18:25:00,     9.8358531,   50,  Fall
2013-12-20 19:25:00,     10.5064795,  56,  Fall
2013-12-20 20:25:00,     11.8477322,  55,  Fall
2013-12-20 21:25:00,     NA,          53,  NA
2013-12-20 22:25:00,     13.1889849,  47,  Fall
2013-12-20 23:25:00,     13.1889849,  60,  Fall
2013-12-21 01:25:00,     NA,          51,  NA
2013-12-21 02:25:00,     9.6123110,   50,  Winter
2013-12-21 03:25:00,     7.6004320,   53,  Winter
2013-12-21 04:25:00,     9.6123110,   52,  Winter
2013-12-21 05:25:00,     8.2710583,   66,  Winter

enter image description here

Upvotes: 3

Related Questions