Alessandra
Alessandra

Reputation: 5

Proper x axis scale with years only

I have a grid with two plots, each one consist of two time series of mean values: one come from an elaboration with R df5 the other one mmzep is not (I received this dataset already calculated).

library(dplyr) 
library(lubridate)
df5 <- data.frame(df$Date, df$Price)
colnames(df5)<- c("date","price")
df5$date <- as.Date(df5$date,"%Y/%m/%d")
df5$price<- as.numeric(gsub(",",".",df5$price))
colnames(mmzep)<- c("date","Mar","Apr")

Then, I created other two dfs from df5 , I tried to group in only one df, but I was not able to do it.

meanM <- df5 %>%
          mutate(Month = month(date), Year = year(date)) %>%
          filter(month(df5$date) %in% 3 & year(df5$date) %in% 2010:2019) %>%
          group_by(Year, Month)  %>%
          summarise_all(list(mean=mean,  sd=sd), na.rm=TRUE) %>%
          na.omit()

   Year Month date_mean     price_mean date_sd    price_sd 
  <dbl> <dbl> <date>             <dbl>   <dbl>       <dbl>
1  2010     3 2010-03-23         1082.    5.48        685.
2  2012     3 2012-03-27          858.    2.74        333.
3  2015     3 2015-03-16          603.    8.86        411.
4  2017     3 2017-03-15          674.    9.65        512.
5  2018     3 2018-03-16          318.    9.09        202.
6  2019     3 2019-03-14          840.    9.42        329.

meanA <- df5 %>%
          mutate(Month = month(date), Year = year(date)) %>%
          filter(month(df5$date) %in% 4 & year(df5$date) %in% 2010:2019) %>%
          group_by(Year, Month)  %>%
          summarise_all(list(mean=mean, sd=sd), na.rm=TRUE) %>%
          na.omit()

   Year Month date_mean     price_mean date_sd    price_sd 
  <dbl> <dbl> <date>             <dbl>   <dbl>       <dbl>
1  2010     4 2010-04-18          361.    9.00        334.
2  2011     4 2011-04-14          527.    8.36        312.
3  2012     4 2012-04-15          726.    8.80        435.
4  2013     4 2013-04-16          872.    8.50        521.
5  2014     4 2014-04-09          668.    5.34        354.
6  2015     4 2015-04-15          689.    8.80        436.
7  2017     4 2017-04-15          806.    8.80        531.
8  2018     4 2018-04-15          727.    8.80        291.
9  2019     4 2019-04-15          600.    8.94        690.

#mmzep
date   Mar   Apr 
   <dbl> <dbl> <dbl>
 1  2010  793.  540 
 2  2011  650   378.
 3  2012  813.  612.
 4  2013  755.  717 
 5  2014  432.  634 
 6  2015  474.  782.
 7  2016  590   743.
 8  2017  544.  628 
 9  2018  249.  781 
10  2019  547.  393 

I plot the dfs

g5 = ggplot() + 
  geom_point(data=meanM, aes(x = (Year), y = (price_mean)),size = 3, colour="gray40") +
  geom_point(data=mmzep, aes(x= (date), y=(Mar)), size =3, colour = "red") +
  geom_line(data=meanM, aes(group = 1, x = (Year), y = (price_mean)), colour="gray40") +  
  geom_line(data=mmzep, aes(x = (date), y = (Mar)), colour="red") +  
  stat_smooth(data=meanM,aes(group = 1, x = (Year), y = (price_mean)), 
              method = "lm", size = 1, se = FALSE, formula = y ~ x, 
              colour = "black") +
  stat_smooth(data=mmzep, aes(x = (date), y = (Mar)), 
              method = "lm", size = 1, se = FALSE, formula = y ~ x,
              colour = "red3") +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 1500))  +
  theme(panel.background = element_rect(fill = 'white', colour = 'black'),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.ticks.length = unit(-0.25, "lines"),
        plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"),
        axis.text.x = element_text(margin = margin(t = 0.25, unit = "cm")),
        axis.text.y = element_text(margin = margin(r = 0.25, unit = "cm"))) +
  labs(y = expression(March), 
       x = NULL) +
  theme(axis.text.x = element_text(size=10), 
        axis.title = element_text(size=10))

I plot g5 and g6 in the same way, than the grid, to obtain this: enter image description here

As you can see the x axis is not correct, I tried scale_x_date(breaks="year", labels=date_format("%Y")) , scale_x_discrete(labels=c("2010","2011","2012","2013","2014","2015","2016","2017","2018","2019")), scale_x_continuous in different ways. I also tried mmzep$date <- as.Date(mmzep$date,"%Y") but I saw the R needs a day (in my case a day and a month?) mmzep$date <- as.Date(paste("01", mmzep$date, sep="/"), "%d/%m/%Y") , but R substitutes the years with NA. I think that the errors is in the the way R see the date in mmzep, but I don't understand how can I made R recognized the correct object.

Anyone have any suggestion? Thanks in advance!

Upvotes: 0

Views: 76

Answers (2)

Ben Norris
Ben Norris

Reputation: 5747

There are a few ways to do this. In your data, your year values are stored as type double. This tells ggplot that you have a continuous variable. If you want to leave your data as is, then the solution is

+ scale_x_continuous(breaks = seq(2010, 2020, 2)) 
# or something else that expressly lists the years you want to see on the axis.

You cannot use scale_x_date without your year data being converted to a date. You can do that with, for example

MeanM$Year <- as.Date(paste(MeanM$Year, "01", "01", sep = "/"))

Then you can use + scale_x_date(date_labels = "%Y")

Or you can convert your years into discrete data with factor. You cannot use scale_x_discrete on a continuous variable.

MeanM$Year <- factor(MeanM$Year)

And then use

+ scale_x_discrete()

Upvotes: 1

Duck
Duck

Reputation: 39585

Try this approach tested on MeanM without using mmzep which we do not have data. The issue is that as you are using multiple geom the functions are adding strange labels to axis. Changing all x-axis variables to factor can alleviate the issue. In the case of mmzep with aes(x= (date),..) also be careful on formating the date as year with a code like this aes(x= factor(format(date,'%Y')) so that all labels fit well into axis. Here the code:

#Code
ggplot() + 
  geom_point(data=meanM, aes(x = factor(Year), y = (price_mean)),size = 3, colour="gray40") +
  geom_line(data=meanM, aes(group = 1, x = factor(Year), y = (price_mean)), colour="gray40") +  
  stat_smooth(data=meanM,aes(group = 1, x = factor(Year), y = (price_mean)), 
              method = "lm", size = 1, se = FALSE, formula = y ~ x, 
              colour = "black") +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 1500))  +
  theme(panel.background = element_rect(fill = 'white', colour = 'black'),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.ticks.length = unit(-0.25, "lines"),
        plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"),
        axis.text.x = element_text(margin = margin(t = 0.25, unit = "cm")),
        axis.text.y = element_text(margin = margin(r = 0.25, unit = "cm"))) +
  labs(y = expression(March), 
       x = NULL) +
  theme(axis.text.x = element_text(size=10), 
        axis.title = element_text(size=10))

Output:

enter image description here

Some data used:

#Data
meanM <- structure(list(Year = c(2010L, 2012L, 2015L, 2017L, 2018L, 2019L
), Month = c(3L, 3L, 3L, 3L, 3L, 3L), date_mean = c("23/03/2010", 
"27/03/2012", "16/03/2015", "15/03/2017", "16/03/2018", "14/03/2019"
), price_mean = c(1082L, 858L, 603L, 674L, 318L, 840L), date_sd = c(5.48, 
2.74, 8.86, 9.65, 9.09, 9.42), price_sd = c(685L, 333L, 411L, 
512L, 202L, 329L), Year2 = structure(1:6, .Label = c("2010", 
"2012", "2015", "2017", "2018", "2019"), class = "factor")), row.names = c(NA, 
-6L), class = "data.frame")

Upvotes: 1

Related Questions