Mr.Takeo
Mr.Takeo

Reputation: 223

R : How to filter subset min of minutes in every hour?

I'm still new to R. I can query selecting a first row of every hour by partition in SQL server management. I found a very big problem from my program it takes too long time with sqlQuery() function in R and more long on running my shiny web app so much. The one tested way to less take time is query of all data and prepare the data in R much faster.

So here is my example data :

                    Date Val
1    2017-06-11 04:00:02  83
2    2017-06-11 04:01:02  77
3    2017-06-11 04:03:01  78
..
4    2017-05-11 05:00:03  83
5    2017-05-11 05:01:02  95
6    2017-05-11 05:02:02  10
..
7    2017-05-11 06:01:00  29
8    2017-05-11 06:02:02  39
9    2017-05-11 06:03:03  85
10   2017-05-11 06:04:02  71
..
11   2017-05-11 07:05:02  33
12   2017-05-11 07:06:02  14
.. (many rows)
13   2017-05-12 00:01:01  20
14   2017-05-12 00:03:01  45

Sometimes there are missing rows during an hour. My problem is how to filter only a min minute row of every hour.

My example output :

                    Date Val
1    2017-05-11 04:00:02  83
4    2017-05-11 05:00:03  83
7    2017-05-11 06:01:00  29
11   2017-05-11 07:05:02  33
13   2017-05-12 00:01:01  20

Any idea please help. Thank you so much.

Upvotes: 1

Views: 1426

Answers (2)

neilfws
neilfws

Reputation: 33772

You can use lubridate and dplyr to convert the dates, extract the date and hour, then filter for the minimum time within an hour by date. I changed your June 31 to June 30 :)

library(dplyr)
library(lubridate)

dataset %>% 
  mutate(Date = ymd_hms(Date), dt = as_date(Date), hr = hour(Date)) %>% 
  group_by(dt, hr) %>% 
  filter(Date == min(Date)) %>% 
  ungroup() %>% 
  select(Date, Val)

                 Date   Val
               <dttm> <int>
1 2017-06-30 04:00:02    83
2 2017-05-11 05:00:03    83
3 2017-05-11 06:01:00    29
4 2017-05-11 07:05:02    33
5 2017-05-12 00:01:01    20

Upvotes: 1

amatsuo_net
amatsuo_net

Reputation: 2448

Here is how I would do. First order by the date. Second group the data by hour (actually first floor Date, then group by the floored Date) and pick up the first row in each group.

library(dplyr)
library(data.table)
library(lubridate)

df <- setDT(df)[order(Date)]
output <- df[, .(Date = Date[1], Val = Val[1]) , 
             by = .(Group = floor_date(Date, "hour"))] 
head(output)
##                  Group                Date Val
## 1: 2017-05-11 04:00:00 2017-05-11 04:00:49  99
## 2: 2017-05-11 05:00:00 2017-05-11 05:01:13  45
## 3: 2017-05-11 06:00:00 2017-05-11 06:00:08  45
## 4: 2017-05-11 07:00:00 2017-05-11 07:00:20  64
## 5: 2017-05-11 08:00:00 2017-05-11 08:00:18  99
## 6: 2017-05-11 09:00:00 2017-05-11 09:06:25  40

I tested with this example data

time_origin <- ymd_hms("2017-05-11 04:00:02")
set.seed(123)
df <- data.frame(Date = (sample(100000, size = 1000, TRUE) %>% seconds() %>%
                                  `+`(time_origin)),
                  Val = sample(99, size = 1000, TRUE)) 

Upvotes: 1

Related Questions