Jack
Jack

Reputation: 101

taking minimum and maximum date and time in R data.table group by

I have a data.table dt which looks like:

>dt <- data.table(grp = c(1,1,1,1,1, 2,2,2,2,2,3,3,3,3,3),
                 date = c("2017-04-24", "2017-04-25", "2017-04-26", "2017-04-27","2017-04-28", 
                          "2017-05-11", "2017-05-11", "2017-05-13", "2017-05-14","2017-05-15",
                          "2017-06-16", "2017-06-17", "2017-06-18", "2017-06-20", "2017-06-20"),
                 time = c(1817, 1902, 1704, 1404, 1152, 1344, 1455, 1235, 0844, 0744, 1439,1346, 1525,1211, 1333))
>dt
    grp       date time
 1:   1 2017-04-24 1817
 2:   1 2017-04-25 1902
 3:   1 2017-04-26 1704
 4:   1 2017-04-27 1404
 5:   1 2017-04-28 1152
 6:   2 2017-05-11 1344
 7:   2 2017-05-11 1455
 8:   2 2017-05-13 1235
 9:   2 2017-05-14  844
10:   2 2017-05-15  744
11:   3 2017-06-16 1439
12:   3 2017-06-17 1346
13:   3 2017-06-18 1525
14:   3 2017-06-20 1211
15:   3 2017-06-20 1333

I wanted to find the minimum and maximum date and time for each of the group "grp".

In order to find minimum and maximum date I did:

dt[,"min_date" := min(date), by=c("grp")]
dt[,"max_date" := max(date), by=c("grp")] 

I need to calculate the minimum and and maximum time too by each group "grp". the minimum time is the time associated with the record with minimum date, and maximum time is the time associate with the record with maximum date.

If my there is repetition of maximum date and minimum date in the records then the time associated with the final maximum record must be taken for max time and time associated with first minimum record should be taken for minimum time.

My final outcome should look like:

> dt
    grp       date time   min_date   max_date min_time max_time
 1:   1 2017-04-24 1817 2017-04-24 2017-04-28     1817     1152
 2:   1 2017-04-25 1902 2017-04-24 2017-04-28     1817     1152
 3:   1 2017-04-26 1704 2017-04-24 2017-04-28     1817     1152
 4:   1 2017-04-27 1404 2017-04-24 2017-04-28     1817     1152
 5:   1 2017-04-28 1152 2017-04-24 2017-04-28     1817     1152
 6:   2 2017-05-11 1344 2017-05-11 2017-05-15     1344      744
 7:   2 2017-05-11 1455 2017-05-11 2017-05-15     1344      744
 8:   2 2017-05-13 1235 2017-05-11 2017-05-15     1344      744
 9:   2 2017-05-14  844 2017-05-11 2017-05-15     1344      744
10:   2 2017-05-15  744 2017-05-11 2017-05-15     1344      744
11:   3 2017-06-16 1439 2017-06-16 2017-06-20     1439     1333
12:   3 2017-06-17 1346 2017-06-16 2017-06-20     1439     1333
13:   3 2017-06-18 1525 2017-06-16 2017-06-20     1439     1333
14:   3 2017-06-20 1211 2017-06-16 2017-06-20     1439     1333
15:   3 2017-06-20 1333 2017-06-16 2017-06-20     1439     1333

How can I do this in R in data.table

Upvotes: 1

Views: 2758

Answers (1)

Yannis Vassiliadis
Yannis Vassiliadis

Reputation: 1709

This should work:

dt[,"min_time" := min(time[which(min_date==date)]), by=grp]
dt[,"max_time" := max(time[which(max_date==date)]), by=grp]
dt
        grp       date time   min_date   max_date min_time max_time
 1:   1 2017-04-24 1817 2017-04-24 2017-04-28     1817     1152
 2:   1 2017-04-25 1902 2017-04-24 2017-04-28     1817     1152
 3:   1 2017-04-26 1704 2017-04-24 2017-04-28     1817     1152
 4:   1 2017-04-27 1404 2017-04-24 2017-04-28     1817     1152
 5:   1 2017-04-28 1152 2017-04-24 2017-04-28     1817     1152
 6:   2 2017-05-11 1344 2017-05-11 2017-05-15     1344      744
 7:   2 2017-05-11 1455 2017-05-11 2017-05-15     1344      744
 8:   2 2017-05-13 1235 2017-05-11 2017-05-15     1344      744
 9:   2 2017-05-14  844 2017-05-11 2017-05-15     1344      744
10:   2 2017-05-15  744 2017-05-11 2017-05-15     1344      744
11:   3 2017-06-16 1439 2017-06-16 2017-06-20     1439     1333
12:   3 2017-06-17 1346 2017-06-16 2017-06-20     1439     1333
13:   3 2017-06-18 1525 2017-06-16 2017-06-20     1439     1333
14:   3 2017-06-20 1211 2017-06-16 2017-06-20     1439     1333
15:   3 2017-06-20 1333 2017-06-16 2017-06-20     1439     1333

Or in one line:

dt[, `:=`(min_time = min(time[which(min_date==date)]), max_time = max(time[which(max_date==date)])), by=grp]

Upvotes: 1

Related Questions