Reputation: 157
I am trying to subset/ match data by groups from 2 data.tables and cannot figure out how do this is in R. I have the following data.table that has a City_ID and a time stamp (column name=Time).
Library(data.table)
timetable <- data.table(City_ID=c("12","9"),
Time=c("12-29-2013-22:05:03","12-29-2013-11:59:00"))
I have a second data.table with several observation for cities and time stamps (plus additional data). The table looks like this:
DT = data.table(City_ID =c("12","12","12","9","9","9"),
Time= c("12-29-2013-13:05:13","12-29-2013-22:05:03",
"12-28-2013-13:05:13","12-29-2013-11:59:00",
"01-30-2013-10:05:03","12-28-2013-13:05:13"),
Other=1:6)
Now I need to find the observations for each city in DT that have a Time >= Time in the other data.table "timetable" (which is basically the matchtable). Only those records should be kept (including the columns that are not used for the calculation; in the example column "other"). The result I want looks like this:
desiredresult = data.table(City_ID=c("12","9"),
Time= c("12-29-2013-22:05:03","12-29-2013-11:59:00"),
Other=c("2","4"))
I have tried the following:
setkey(DT, City_ID, Time)
setkey(timetable, City_ID)
failedresult = DT[,Time >= timetable[Time], by=City_ID]
failedresult2 = DT[,Time >= timetable, by=City_ID]
BTW: I know it may be better to additionally split date and time, but this may make the example even more complex (and when I tested finding a minimum in the time stamps through data.table, it seemed to work).
Upvotes: 3
Views: 1474
Reputation: 81683
Here's an approach for this task:
# 1) transform string to POSIXct object
DT[ , Time := as.POSIXct(strptime(Time, "%m-%d-%Y-%X"))]
timetable[ , Time := as.POSIXct(strptime(Time, "%m-%d-%Y-%X"))]
# 2) set key
setkey(DT, City_ID)
setkey(timetable, City_ID)
# 3) join tables
DT2 <- DT[timetable]
# 4) extract rows and columns
DT2[Time >= Time.1, .SD, .SDcols = names(DT)]
# City_ID Time Other
# 1: 12 2013-12-29 22:05:03 2
# 2: 9 2013-12-29 11:59:00 4
Upvotes: 3