Reputation: 101
I have a data.table dt which looks like:
>dt <- data.table(grp = c(1,1,1,1,1, 2,2,2,2,2,3,3,3,3,3),
date = c("2017-04-24", "2017-04-25", "2017-04-26", "2017-04-27","2017-04-28",
"2017-05-11", "2017-05-11", "2017-05-13", "2017-05-14","2017-05-15",
"2017-06-16", "2017-06-17", "2017-06-18", "2017-06-20", "2017-06-20"),
time = c(1817, 1902, 1704, 1404, 1152, 1344, 1455, 1235, 0844, 0744, 1439,1346, 1525,1211, 1333))
>dt
grp date time
1: 1 2017-04-24 1817
2: 1 2017-04-25 1902
3: 1 2017-04-26 1704
4: 1 2017-04-27 1404
5: 1 2017-04-28 1152
6: 2 2017-05-11 1344
7: 2 2017-05-11 1455
8: 2 2017-05-13 1235
9: 2 2017-05-14 844
10: 2 2017-05-15 744
11: 3 2017-06-16 1439
12: 3 2017-06-17 1346
13: 3 2017-06-18 1525
14: 3 2017-06-20 1211
15: 3 2017-06-20 1333
I wanted to find the minimum and maximum date and time for each of the group "grp".
In order to find minimum and maximum date I did:
dt[,"min_date" := min(date), by=c("grp")]
dt[,"max_date" := max(date), by=c("grp")]
I need to calculate the minimum and and maximum time too by each group "grp". the minimum time is the time associated with the record with minimum date, and maximum time is the time associate with the record with maximum date.
If my there is repetition of maximum date and minimum date in the records then the time associated with the final maximum record must be taken for max time and time associated with first minimum record should be taken for minimum time.
My final outcome should look like:
> dt
grp date time min_date max_date min_time max_time
1: 1 2017-04-24 1817 2017-04-24 2017-04-28 1817 1152
2: 1 2017-04-25 1902 2017-04-24 2017-04-28 1817 1152
3: 1 2017-04-26 1704 2017-04-24 2017-04-28 1817 1152
4: 1 2017-04-27 1404 2017-04-24 2017-04-28 1817 1152
5: 1 2017-04-28 1152 2017-04-24 2017-04-28 1817 1152
6: 2 2017-05-11 1344 2017-05-11 2017-05-15 1344 744
7: 2 2017-05-11 1455 2017-05-11 2017-05-15 1344 744
8: 2 2017-05-13 1235 2017-05-11 2017-05-15 1344 744
9: 2 2017-05-14 844 2017-05-11 2017-05-15 1344 744
10: 2 2017-05-15 744 2017-05-11 2017-05-15 1344 744
11: 3 2017-06-16 1439 2017-06-16 2017-06-20 1439 1333
12: 3 2017-06-17 1346 2017-06-16 2017-06-20 1439 1333
13: 3 2017-06-18 1525 2017-06-16 2017-06-20 1439 1333
14: 3 2017-06-20 1211 2017-06-16 2017-06-20 1439 1333
15: 3 2017-06-20 1333 2017-06-16 2017-06-20 1439 1333
How can I do this in R in data.table
Upvotes: 1
Views: 2758
Reputation: 1709
This should work:
dt[,"min_time" := min(time[which(min_date==date)]), by=grp]
dt[,"max_time" := max(time[which(max_date==date)]), by=grp]
dt
grp date time min_date max_date min_time max_time
1: 1 2017-04-24 1817 2017-04-24 2017-04-28 1817 1152
2: 1 2017-04-25 1902 2017-04-24 2017-04-28 1817 1152
3: 1 2017-04-26 1704 2017-04-24 2017-04-28 1817 1152
4: 1 2017-04-27 1404 2017-04-24 2017-04-28 1817 1152
5: 1 2017-04-28 1152 2017-04-24 2017-04-28 1817 1152
6: 2 2017-05-11 1344 2017-05-11 2017-05-15 1344 744
7: 2 2017-05-11 1455 2017-05-11 2017-05-15 1344 744
8: 2 2017-05-13 1235 2017-05-11 2017-05-15 1344 744
9: 2 2017-05-14 844 2017-05-11 2017-05-15 1344 744
10: 2 2017-05-15 744 2017-05-11 2017-05-15 1344 744
11: 3 2017-06-16 1439 2017-06-16 2017-06-20 1439 1333
12: 3 2017-06-17 1346 2017-06-16 2017-06-20 1439 1333
13: 3 2017-06-18 1525 2017-06-16 2017-06-20 1439 1333
14: 3 2017-06-20 1211 2017-06-16 2017-06-20 1439 1333
15: 3 2017-06-20 1333 2017-06-16 2017-06-20 1439 1333
Or in one line:
dt[, `:=`(min_time = min(time[which(min_date==date)]), max_time = max(time[which(max_date==date)])), by=grp]
Upvotes: 1