pacomet
pacomet

Reputation: 5141

Can not add additional geom_line to plot

I have used tidyverse and lubridate package to plot a multiline time series showing different years of a time series. Now, I wanted to add one more line showing the average year but then R throws an error.

Original data is a time series consisting of daily data in three columns: date, value, day of year

str(datos)
'data.frame':    13379 obs. of  3 variables:
 $ fecha: Date, format: "1982-01-01" "1982-01-02" "1982-01-03" ...
 $ SSTm : num  15.7 15.9 16.2 16.1 16 ...
 $ day  : num  1 2 3 4 5 6 7 8 9 10 ...

Then I use this code to arrange data for plotting

df <- as_tibble(datos) %>%
rename_all(tolower) %>%
mutate(fecha = ymd(fecha))

# Define the plot: all years with different colour
p <- df %>%
  mutate(
    year = factor(year(fecha)),     # use year to define separate curves
    date = update(fecha, year = 1)  # use a constant year for the x-axis
  ) %>%
  ggplot(aes(date, sstm, color = year)) +
    scale_x_date(date_breaks = "1 month", date_labels = "%m") + xlab(" ") +
   ylab("SST (ºC)") + theme_bw() + ggtitle("Mediterranean daily SST average (1982-2018)")

and plot p + geom_line()

enter image description here

Structure of $pdata

head(p$data)
# A tibble: 6 x 5
  fecha       sstm   day year  date      
  <date>     <dbl> <dbl> <fct> <date>    
1 1982-01-01  15.7     1 1982  1-01-01   
2 1982-01-02  15.9     2 1982  1-01-02   
3 1982-01-03  16.2     3 1982  1-01-03   
4 1982-01-04  16.1     4 1982  1-01-04   
5 1982-01-05  16.0     5 1982  1-01-05   
6 1982-01-06  15.9     6 1982  1-01-06   
> str(p$data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    13379 obs. of  5 variables:
 $ fecha: Date, format: "1982-01-01" "1982-01-02" "1982-01-03" ...
 $ sstm : num  15.7 15.9 16.2 16.1 16 ...
 $ day  : num  1 2 3 4 5 6 7 8 9 10 ...
 $ year : Factor w/ 37 levels "1982","1983",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ date : Date, format: "1-01-01" "1-01-02" "1-01-03" ...

Then, the average year is built from datos grouping by day and averaging

datos.daily.mean <- datos %>%
  group_by(day) %>%
  summarise(sstm = mean(SSTm))
datos.daily.mean$fecha<-as.Date(datos.daily.mean$day, origin = "1970-01-01")

Data structure for datos.daily.mean

head(datos.daily.mean)
# A tibble: 6 x 3
    day  sstm fecha     
  <dbl> <dbl> <date>    
1     1  16.2 1970-01-02
2     2  16.2 1970-01-03
3     3  16.2 1970-01-04
4     4  16.1 1970-01-05
5     5  16.1 1970-01-06
6     6  16.0 1970-01-07

> str(datos.daily.mean)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    366 obs. of  3 variables:
 $ day  : num  1 2 3 4 5 6 7 8 9 10 ...
 $ sstm : num  16.2 16.2 16.2 16.1 16.1 ...
 $ fecha: Date, format: "1970-01-02" "1970-01-03" "1970-01-04" ...

datos.daily.mean can be plotted with

ggplot() + geom_line(data=datos.daily.mean, aes(x=fecha, y= sstm)) + scale_x_date(date_breaks = "1 month", date_labels = "%m") + xlab(" ")

But if I try to join both plots, by adding a new geom_line for average year, I get an error message about date format

p + geom_line() + geom_line(data=datos.daily.mean, aes(x=fecha, y= sstm),colour='blue')

Error in charToDate(x): character string is not in a standard unambiguous format

But I think dates are in standard format in both data sets. Any idea/help would be appreciated. Thanks

Upvotes: 0

Views: 1084

Answers (1)

pacomet
pacomet

Reputation: 5141

I found the solution to my question. It was my mistake to use fecha and date, both time data but in different formats. I added date with the mutate sentence to datos.daily.mean and it worked.

q <- datos.daily.mean %>% 
  mutate(
    year = factor(year(fecha)),     # use year to define separate curves
    date = update(fecha, year = 1)  # use a constant year for the x-axis
  )

and

p + geom_line(aes(group = year), color = "black", alpha = 0.1) +
  geom_line(data = function(x) filter(x, year == 2018), size = 1.5) +
  geom_line(data=q, aes(x=date, y= sstm),colour='black')

enter image description here

Upvotes: 0

Related Questions