Kathryn Withers
Kathryn Withers

Reputation: 35

Getting the max for a set of rows in R

So I have a set of Time series data which is made up of 1 min intervals for a whole month. I am looking to get some results from this data.

Can't add a sample of my data until I get 10 posts.

I have already copied it to another dataset so I don't alter the other one.

busiestmin <- rdata6
dput(MyData)
busiestmin[ busiestmin$Utilized == max(busiestmin$Utilized) , ] 
# A tibble: 1 × 3
  Entitled Utilized            datetime
  <dbl>    <dbl>              <dttm>
    1     2.73 2016-12-18 02:10:00
busiestmin[ busiestmin$Utilized == min(busiestmin$Utilized) , ]
# A tibble: 22 × 3
   Entitled Utilized            datetime
    <dbl>    <dbl>              <dttm>
      0        0 2016-12-11 03:03:00
      0        0 2016-12-11 03:04:00
      0        0 2016-12-11 03:05:00
      0        0 2016-12-11 03:06:00
      0        0 2016-12-11 03:07:00
      0        0 2016-12-11 03:08:00
      0        0 2016-12-11 03:09:00
      0        0 2016-12-11 03:10:00
      0        0 2016-12-11 03:11:00
      0        0 2016-12-11 03:12:00

# ... with 12 more rows As you can see above I know how to get the min and max of one line but would like to get it to show a set of 60 lines (1 Hour) and 1440 line (1 day).

Not sure if the link to the sample data set will work?

dput (MyData) 
 Entitled   Utilized    datetime
  1     1.04        12/3/2016 0:01
  1     1.04        12/3/2016 0:02
  1     1.04        12/3/2016 0:03
  1     1.20        12/3/2016 0:04
  1     1.21        12/3/2016 0:05

Upvotes: 2

Views: 100

Answers (1)

Julia Silge
Julia Silge

Reputation: 11603

I think that using lubridate is going to help you out here a lot.

This is how I read in your example data:

library(readr)
example_data <- read_csv("Entitled, Utilized, datetime\n
                                 1,     1.04, 2016-12-03 00:01:00\n
                                 1,     1.04, 2016-12-03 00:02:00\n
                                 1,     1.04, 2016-12-03 00:03:00\n
                                 1,     1.20, 2016-12-03 00:04:00\n
                                 1,     1.21, 2016-12-03 00:05:00\n
                                 1,     1.05, 2016-12-03 00:06:00\n
                                 1,     1.05, 2016-12-03 00:07:00\n
                                 1,     1.05, 2016-12-03 00:08:00\n
                                 1,     1.43, 2016-12-03 00:09:00\n
                                 1,     1.60, 2016-12-03 00:10:00")

Since your example data is all from the same hour and day, we aren't going to be able to see differences between hours and day, but this should work for what you are talking about. The first thing we need to do is set up a new variable that keeps track of which day (or hour) each time stamp came from. You can use floor_date from lubridate for that.

library(lubridate)
library(dplyr)

example_data %>% 
    mutate(FloorDate = floor_date(datetime, unit = "1 day"))
#> # A tibble: 10 × 4
#>    Entitled Utilized            datetime  FloorDate
#>       <int>    <dbl>              <dttm>     <dttm>
#> 1         1     1.04 2016-12-03 00:01:00 2016-12-03
#> 2         1     1.04 2016-12-03 00:02:00 2016-12-03
#> 3         1     1.04 2016-12-03 00:03:00 2016-12-03
#> 4         1     1.20 2016-12-03 00:04:00 2016-12-03
#> 5         1     1.21 2016-12-03 00:05:00 2016-12-03
#> 6         1     1.05 2016-12-03 00:06:00 2016-12-03
#> 7         1     1.05 2016-12-03 00:07:00 2016-12-03
#> 8         1     1.05 2016-12-03 00:08:00 2016-12-03
#> 9         1     1.43 2016-12-03 00:09:00 2016-12-03
#> 10        1     1.60 2016-12-03 00:10:00 2016-12-03

If your data has more than one day or hour in it, you will see those in the new column we just created. You can use unit = "1 hour" if you want to find the max/min for each hour. Now we can use group_by and summarize to find the max and min for each day.

example_data %>% 
    mutate(FloorDate = floor_date(datetime, unit = "1 day")) %>%
    group_by(FloorDate) %>%
    summarise(MaxUtilized = max(Utilized),
              MinUtilized = min(Utilized))
#> # A tibble: 1 × 3
#>    FloorDate MaxUtilized MinUtilized
#>       <dttm>       <dbl>       <dbl>
#> 1 2016-12-03         1.6        1.04

If your real data has multiple days in it, your result here will have rows for each day, with the max and min for each.

Upvotes: 3

Related Questions