Data miner123
Data miner123

Reputation: 119

Plot Y values against the time grouped by an ID

I want make a time series plot grouped by ID. My dataset has 42 different IDs with 7 different timeframes. The timeframe varies per ID and ranges from 9/2016 to 8/2018. I.e., ID1 can start 10/2016 and end 7/2017 (with 7 rows containing a different date) and ID40 can start 11/2016 and ends 6/2018 (also with 7 rows containing a different date). I try to plot this with the following code

p <- ggplot(data = df6, aes(x = START, y = AI, col = ID, group = ID))
p + geom_point(size     = 1.2,
             alpha    = .8) + stat_smooth(aes(group = 1)) + stat_summary(aes(group = 1), geom = 
             "point", fun.y = mean, 
             shape = 17, size = 3) + theme_minimal() + theme(axis.text.x = element_text(angle = 
             90,  vjust = 0.5, hjust=1))

This gives me the following graph:

enter image description here

As one can see the X-axis is not chronological. I should start at 09/2016 and end at 08/2018 and then correspond with the Y value based on the ID. I got the following dataset:

structure(list(ID = c("ID1", "ID1", "ID1", "ID1", "ID1", "ID1", 
"ID1", "ID10", "ID10", "ID10", "ID10", "ID10", "ID10", "ID10", 
"ID11", "ID11", "ID11", "ID11", "ID11", "ID12"), Time = c("1", 
"2", "3", "4", "5", "6", "7", "1", "2", "3", "4", "5", "6", "7", 
"1", "2", "3", "4", "5", "1"), AI = c(0.393672183448241, 0.4876954603533, 
0.411717908455957, 0.309769862660288, 0.149826889496538, 0.2448558592586, 
0.123606753324621, 0.296109333767922, 0.309960002123076, 0.445886231347992, 
0.370013553008003, 0.393414429902431, 0.318940511323733, 0.131112361225666, 
0.31961673567578, 0.227268892979164, 0.433471105477564, 0.207184572401005, 
0.144257239122978, 0.520204263001733), AI_VAR = c(0.154977788020905, 
0.237846862049217, 0.169511636143347, 0.0959573678125739, 0.0224480968162077, 
0.0599543918132674, 0.0152786294674538, 0.0876807375444826, 0.0960752029161373, 
0.198814531305715, 0.136910029409606, 0.154774913655455, 0.101723049763444, 
0.0171904512661696, 0.102154857724042, 0.0516511497159746, 0.187897199283942, 
0.0429254470409874, 0.020810151039384, 0.270612475245176), activity = c(0, 
0.303472222222222, 0.232638888888889, 0.228472222222222, 0.348611111111111, 
0.215972222222222, 0.123611111111111, 0.357638888888889, 0.235416666666667, 
0.233333333333333, 0.2875, 0.353472222222222, 0.356944444444444, 
0.149305555555556, 0.448611111111111, 0.213888888888889, 0.248611111111111, 
0.288888888888889, 0.25625, 0.238888888888889), ZIM_SD = c(0, 
0.148002025121106, 0.095781596758851, 0.0707738088994687, 0.0522313184217097, 
0.0528820640482116, 0.0152791681192935, 0.105900213118389, 0.0729697504998075, 
0.104040120647865, 0.106378896489801, 0.139061072791901, 0.113844043625277, 
0.0195758039329988, 0.143383618921218, 0.0486102909983211, 0.107765733167339, 
0.059853320915846, 0.036965917525263, 0.124271018383747), ZIM_VAR = c(0, 
0.0721799157746582, 0.039434998686126, 0.0219235930627339, 0.00782565597342798, 
0.0129484832318932, 0.00188860836472692, 0.0313580415523671, 
0.0226177040198407, 0.0463900573046668, 0.0393616334552618, 0.0547086326740462, 
0.0363094774850072, 0.00256662987654616, 0.0458278042289798, 
0.0110476070225835, 0.0467133314886466, 0.0124006847007297, 0.00533260120384214, 
0.0646463135307921), CHECK = c(10L, 13L, 11L, 7L, 7L, 5L, 4L, 
36L, 36L, 34L, 34L, 32L, 29L, 21L, 28L, 27L, 26L, 25L, 21L, 36L
), BULBAR = c(2L, 4L, 4L, 4L, 4L, 2L, 2L, 9L, 9L, 9L, 9L, 9L, 
7L, 6L, 12L, 12L, 11L, 11L, 11L, 11L), FINE = c(0L, 0L, 0L, 0L, 
0L, 0L, 0L, 9L, 9L, 8L, 8L, 7L, 6L, 4L, 2L, 1L, 1L, 1L, 0L, 7L
), GROSS = c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 9L, 9L, 9L, 9L, 8L, 
8L, 6L, 3L, 3L, 3L, 3L, 2L, 6L), RESPI = c(6L, 7L, 5L, 1L, 1L, 
1L, 1L, 9L, 9L, 8L, 8L, 8L, 8L, 5L, 11L, 11L, 11L, 10L, 8L, 12L
), GROSS_RENEWD = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 6L, 6L, 6L, 6L, 
5L, 5L, 4L, 3L, 3L, 3L, 3L, 2L, 3L), ACTIVE = c(2L, 2L, 2L, 2L, 
2L, 2L, 1L, 18L, 18L, 17L, 17L, 15L, 14L, 10L, 5L, 4L, 4L, 4L, 
2L, 13L), NON.ACTIVE = c(8L, 11L, 9L, 5L, 5L, 3L, 3L, 18L, 18L, 
17L, 17L, 17L, 15L, 11L, 23L, 23L, 22L, 21L, 19L, 23L), START = c("09/2016", 
"11/2016", "01/2017", "04/2017", "06/2017", "10/2017", "02/2018", 
"10/2016", "12/2016", "02/2017", "04/2017", "07/2017", "11/2017", 
"04/2018", "10/2016", "12/2016", "02/2017", "04/2017", "07/2017", 
"10/2016"), STOP = c("10/2016", "11/2016", "01/2017", "04/2017", 
"06/2017", "10/2017", "03/2018", "10/2016", "12/2016", "02/2017", 
"04/2017", "07/2017", "11/2017", "04/2018", "10/2016", "12/2016", 
"02/2017", "04/2017", "07/2017", "10/2016")), row.names = c(NA, 
20L), class = "data.frame")

In general I want the column START to start with the begin date and end with the last date when it is plotted

Upvotes: 0

Views: 49

Answers (1)

Quinten
Quinten

Reputation: 41337

You should convert your "START" column to a date format. You could use the package zoo with the function as.yearmon for that. To start the axis with your start date and end it with the end date, you could create a vector of date breaks using the min (start) date and max (end) date. Here is a reproducible example:

library(ggplot2)
library(zoo)
library(dplyr)
df6 <- df6 %>%
  mutate(START = as.Date(as.yearmon(START, format = '%m/%Y'))) 

breaks.vec <- c(min(df6$START),
               seq(from=min(df6$START), to=max(df6$START), by = 'month'))

ggplot(data = df6, aes(x = START, y = AI, col = ID, group = ID)) +
  geom_point(size = 1.2, alpha = .8) + 
  stat_smooth(aes(group = 1)) + 
  stat_summary(aes(group = 1), geom = "point", fun.y = mean, shape = 17, size = 3) + 
  scale_x_date(breaks = breaks.vec, date_labels = "%m/%Y") +
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 90,  vjust = 0.5, hjust=1))
#> Warning: `fun.y` is deprecated. Use `fun` instead.
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Created on 2022-10-17 with reprex v2.0.2

Upvotes: 2

Related Questions