Reputation: 85
I am having this problem with R:
I have a dataset called "teste" that has a column for 'Date' (it's in POSIXct, format = "%Y-%m-%d %H:%M:%S"), which has readings every 10 minutes during a 5 months period.
I need to make a comparison of the variables at the same time but in different days. For example plot every Saturday in the Dataset, overlayed. I already have the code for subsetting the data.frame and have only the Saturdays.
Here is a sample of the data:
DATE ID VAR1 VAR2 VAR3
1 2016-09-19 00:07:47 79 19 0 OPN
2 2016-09-19 00:17:47 79 18 1 OPN
3 2016-09-19 00:27:47 79 16 3 OPN
4 2016-09-19 00:37:47 79 15 4 OPN
5 2016-09-19 00:47:47 79 16 3 OPN
6 2016-09-19 00:57:47 79 16 3 OPN
Here is the dput from the data:
structure(list(FECHA = structure(c(1474236467, 1474237067, 1474237667, 1474841253, 1474841853, 1474842453), class = c("POSIXct", "POSIXt" ), tzone = ""), ID = c(79L, 79L, 79L, 79L, 79L, 79L), SLOTS = c(19L, 18L, 16L, 14L, 15L, 15L), BIKES = c(0L, 1L, 3L, 8L, 7L, 7L), STATUS = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CLS", "OPN"), class = "factor")), .Names = c("FECHA", "ID", "SLOTS", "BIKES", "STATUS"), row.names = c(1L, 2L, 3L, 1004L, 1005L, 1006L ), class = "data.frame")
I tried doing: (used package lubridate)
plot(as.POSIXct(paste(hour(teste$FECHA),":",minute(teste$FECHA),sep = ""), format = "%H:%M"),teste$BIKES)
It works but using 'paste' is definetely not the best way for doing this. There probably is a easier and more elegant way, right? If yes, how?
And I have a problem if I plot it with type = "lines", because it doesnt know that the last reading of a day shouldnt be 'connected' to the first of the next one, giving this result: (see the lines crossing all the graph from 24 hours to 0 hours)
I though about ploting one day at a time, using the plot and then using lines functions, but the problem is that I dont know how many readings are there in each day. Each day should have 6*24=144 readings, but some have 143, 142 (because of problems while getting data).
I appreciate any help.
Upvotes: 1
Views: 1358
Reputation: 146
Please try this:
#Generate dataset
teste=structure(list(FECHA = structure(c(1474236467, 1474237067, 1474237667, 1474841253, 1474841853, 1474842453), class = c("POSIXct", "POSIXt" ), tzone = ""), ID = c(79L, 79L, 79L, 79L, 79L, 79L), SLOTS = c(19L, 18L, 16L, 14L, 15L, 15L), BIKES = c(0L, 1L, 3L, 8L, 7L, 7L), STATUS = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CLS", "OPN"), class = "factor")), .Names = c("FECHA", "ID", "SLOTS", "BIKES", "STATUS"), row.names = c(1L, 2L, 3L, 1004L, 1005L, 1006L ), class = "data.frame")
#marked by date
teste$day <- format(teste$FECHA,'%Y-%m-%d')
#marked by hour & minute
teste$hm <- format(teste$FECHA, format = "%H:%M")
#plot it with hour & minute as x, and day as group
library(ggplot2)
ggplot(teste, aes(x=hm, y=BIKES, group=day)) + geom_point() + geom_line()
And if you want this plot more elegent, your can decorate it with ggplot2 grammer. For example:
ggplot(teste, aes(x=hm, y=BIKES, group=day)) +
geom_point(aes(colour = factor(day))) +
geom_line(aes(colour = factor(day))) +
theme_bw()+labs(colour="Date")
Upvotes: 0
Reputation: 17648
You can try
# add factor to differentiate between the different days
d$day <- factor(strftime(d$FECHA, format="%Y-%m-%d"))
# add hours, set one day
d$time <- as.POSIXct(strftime(d$FECHA, format="%H:%M:%S"), format="%H:%M:%S")
# color per day
ggplot(d, aes(x=time, y=BIKES, col=day)) +
geom_point() +
geom_line() +
scale_x_datetime(limits = as.POSIXct(c("2017-04-19 00:00:00", "2017-04-19 24:00:00")), date_labels = "%H:%M")
# or facet_wrap/grid
ggplot(d, aes(x=time, y=BIKES)) + geom_point() + geom_line() +
facet_grid(~day)+
scale_x_datetime(limits = as.POSIXct(c("2017-04-19 00:00:00", "2017-04-19 24:00:00")), date_labels = "%H:%M")
# or use the wrap function instead of the grid function
facet_wrap(~day, scales = "free_y")
Upvotes: 0