Reputation: 223
I'm still new to R. I can query selecting a first row of every hour by partition in SQL server management. I found a very big problem from my program it takes too long time with sqlQuery()
function in R and more long on running my shiny web app so much. The one tested way to less take time is query of all data and prepare the data in R much faster.
So here is my example data :
Date Val
1 2017-06-11 04:00:02 83
2 2017-06-11 04:01:02 77
3 2017-06-11 04:03:01 78
..
4 2017-05-11 05:00:03 83
5 2017-05-11 05:01:02 95
6 2017-05-11 05:02:02 10
..
7 2017-05-11 06:01:00 29
8 2017-05-11 06:02:02 39
9 2017-05-11 06:03:03 85
10 2017-05-11 06:04:02 71
..
11 2017-05-11 07:05:02 33
12 2017-05-11 07:06:02 14
.. (many rows)
13 2017-05-12 00:01:01 20
14 2017-05-12 00:03:01 45
Sometimes there are missing rows during an hour. My problem is how to filter only a min minute row of every hour.
My example output :
Date Val
1 2017-05-11 04:00:02 83
4 2017-05-11 05:00:03 83
7 2017-05-11 06:01:00 29
11 2017-05-11 07:05:02 33
13 2017-05-12 00:01:01 20
Any idea please help. Thank you so much.
Upvotes: 1
Views: 1426
Reputation: 33772
You can use lubridate
and dplyr
to convert the dates, extract the date and hour, then filter for the minimum time within an hour by date. I changed your June 31 to June 30 :)
library(dplyr)
library(lubridate)
dataset %>%
mutate(Date = ymd_hms(Date), dt = as_date(Date), hr = hour(Date)) %>%
group_by(dt, hr) %>%
filter(Date == min(Date)) %>%
ungroup() %>%
select(Date, Val)
Date Val
<dttm> <int>
1 2017-06-30 04:00:02 83
2 2017-05-11 05:00:03 83
3 2017-05-11 06:01:00 29
4 2017-05-11 07:05:02 33
5 2017-05-12 00:01:01 20
Upvotes: 1
Reputation: 2448
Here is how I would do. First order by the date. Second group the data by hour (actually first floor Date, then group by the floored Date) and pick up the first row in each group.
library(dplyr)
library(data.table)
library(lubridate)
df <- setDT(df)[order(Date)]
output <- df[, .(Date = Date[1], Val = Val[1]) ,
by = .(Group = floor_date(Date, "hour"))]
head(output)
## Group Date Val
## 1: 2017-05-11 04:00:00 2017-05-11 04:00:49 99
## 2: 2017-05-11 05:00:00 2017-05-11 05:01:13 45
## 3: 2017-05-11 06:00:00 2017-05-11 06:00:08 45
## 4: 2017-05-11 07:00:00 2017-05-11 07:00:20 64
## 5: 2017-05-11 08:00:00 2017-05-11 08:00:18 99
## 6: 2017-05-11 09:00:00 2017-05-11 09:06:25 40
I tested with this example data
time_origin <- ymd_hms("2017-05-11 04:00:02")
set.seed(123)
df <- data.frame(Date = (sample(100000, size = 1000, TRUE) %>% seconds() %>%
`+`(time_origin)),
Val = sample(99, size = 1000, TRUE))
Upvotes: 1