Reputation: 35
So I have a set of Time series data which is made up of 1 min intervals for a whole month. I am looking to get some results from this data.
Can't add a sample of my data until I get 10 posts.
I have already copied it to another dataset so I don't alter the other one.
busiestmin <- rdata6
dput(MyData)
busiestmin[ busiestmin$Utilized == max(busiestmin$Utilized) , ]
# A tibble: 1 × 3
Entitled Utilized datetime
<dbl> <dbl> <dttm>
1 2.73 2016-12-18 02:10:00
busiestmin[ busiestmin$Utilized == min(busiestmin$Utilized) , ]
# A tibble: 22 × 3
Entitled Utilized datetime
<dbl> <dbl> <dttm>
0 0 2016-12-11 03:03:00
0 0 2016-12-11 03:04:00
0 0 2016-12-11 03:05:00
0 0 2016-12-11 03:06:00
0 0 2016-12-11 03:07:00
0 0 2016-12-11 03:08:00
0 0 2016-12-11 03:09:00
0 0 2016-12-11 03:10:00
0 0 2016-12-11 03:11:00
0 0 2016-12-11 03:12:00
# ... with 12 more rows As you can see above I know how to get the min and max of one line but would like to get it to show a set of 60 lines (1 Hour) and 1440 line (1 day).
Not sure if the link to the sample data set will work?
dput (MyData)
Entitled Utilized datetime
1 1.04 12/3/2016 0:01
1 1.04 12/3/2016 0:02
1 1.04 12/3/2016 0:03
1 1.20 12/3/2016 0:04
1 1.21 12/3/2016 0:05
Upvotes: 2
Views: 100
Reputation: 11603
I think that using lubridate is going to help you out here a lot.
This is how I read in your example data:
library(readr)
example_data <- read_csv("Entitled, Utilized, datetime\n
1, 1.04, 2016-12-03 00:01:00\n
1, 1.04, 2016-12-03 00:02:00\n
1, 1.04, 2016-12-03 00:03:00\n
1, 1.20, 2016-12-03 00:04:00\n
1, 1.21, 2016-12-03 00:05:00\n
1, 1.05, 2016-12-03 00:06:00\n
1, 1.05, 2016-12-03 00:07:00\n
1, 1.05, 2016-12-03 00:08:00\n
1, 1.43, 2016-12-03 00:09:00\n
1, 1.60, 2016-12-03 00:10:00")
Since your example data is all from the same hour and day, we aren't going to be able to see differences between hours and day, but this should work for what you are talking about. The first thing we need to do is set up a new variable that keeps track of which day (or hour) each time stamp came from. You can use floor_date
from lubridate for that.
library(lubridate)
library(dplyr)
example_data %>%
mutate(FloorDate = floor_date(datetime, unit = "1 day"))
#> # A tibble: 10 × 4
#> Entitled Utilized datetime FloorDate
#> <int> <dbl> <dttm> <dttm>
#> 1 1 1.04 2016-12-03 00:01:00 2016-12-03
#> 2 1 1.04 2016-12-03 00:02:00 2016-12-03
#> 3 1 1.04 2016-12-03 00:03:00 2016-12-03
#> 4 1 1.20 2016-12-03 00:04:00 2016-12-03
#> 5 1 1.21 2016-12-03 00:05:00 2016-12-03
#> 6 1 1.05 2016-12-03 00:06:00 2016-12-03
#> 7 1 1.05 2016-12-03 00:07:00 2016-12-03
#> 8 1 1.05 2016-12-03 00:08:00 2016-12-03
#> 9 1 1.43 2016-12-03 00:09:00 2016-12-03
#> 10 1 1.60 2016-12-03 00:10:00 2016-12-03
If your data has more than one day or hour in it, you will see those in the new column we just created. You can use unit = "1 hour"
if you want to find the max/min for each hour. Now we can use group_by
and summarize
to find the max and min for each day.
example_data %>%
mutate(FloorDate = floor_date(datetime, unit = "1 day")) %>%
group_by(FloorDate) %>%
summarise(MaxUtilized = max(Utilized),
MinUtilized = min(Utilized))
#> # A tibble: 1 × 3
#> FloorDate MaxUtilized MinUtilized
#> <dttm> <dbl> <dbl>
#> 1 2016-12-03 1.6 1.04
If your real data has multiple days in it, your result here will have rows for each day, with the max and min for each.
Upvotes: 3