Reputation: 2015
Suppose I have two dataframes,
df1
id time1
1 2016-04-07 21:39:10
1 2016-04-05 11:19:17
2 2016-04-03 10:58:25
2 2016-04-02 21:39:10
df2
id time2
1 2016-04-07 21:39:11
1 2016-04-05 11:19:18
1 2016-04-06 21:39:11
1 2016-04-04 11:19:18
2 2016-04-03 10:58:26
2 2016-04-02 21:39:11
2 2016-04-04 10:58:26
2 2016-04-05 21:39:11
I want to find for each entry in df1, the shortest time difference in df2. Suppose we take the first entry, it has id 1, so I want to loop through df2, filter for id 1, then check the time difference between one entry of df1 and remaining entries of df2 and find the shortest difference and fetch the corresponding entry. My sample output should be
id time time2 diff(in secs)
1 2016-04-07 21:39:10 2016-04-07 21:39:10 1
1 2016-04-05 11:19:17 2016-04-05 11:19:17 1
2 2016-04-03 10:58:25 2016-04-03 10:58:25 1
2 2016-04-02 21:39:10 2016-04-02 21:39:10 1
the following is my try,
for(i in unique(df1$id)){
temp1 = df1[df1$id == i,]
temp2 = df2[df2$id == i,]
for(j in unique(df1$time1){
for(k in unique(df2$time2){
diff = abs(df1$time1[j] - df2$time2[k]
print(diff)}}}
I am not able to progress after this, getting many errors. Can anybody help me in correcting this? May be suggest a better efficient way to do this? Any help would be appreciated.
Update:
Reproducable data:
df1 <- data.frame(
id = c(1,1,2,2),
time1 = c('2016-04-07 21:39:10', '2016-04-05 11:19:17', '2016-04-03 10:58:25', '2016-04-02 21:39:10')
)
df2 <- data.frame(
id = c(1,1,1,1,2,2,2,2),
time2 = c('2016-04-07 21:39:11', '2016-04-05 11:19:18','2016-04-07 21:39:11', '2016-04-05 11:19:18', '2016-04-03 10:58:26', '2016-04-02 21:39:11','2016-04-03 10:58:26', '2016-04-02 21:39:11')
)
df1$time1 = as.POSIXct(df1$time1)
df2$time2 = as.POSIXct(df2$time2)
Upvotes: 1
Views: 199
Reputation: 6969
If you work with data.table
:
library(data.table)
df1 <- data.table(
id = c(1,1,2,2),
time1 = c('2016-04-07 21:39:10', '2016-04-05 11:19:17', '2016-04-03 10:58:25', '2016-04-02 21:39:10')
)
df2 <- data.table(
id = c(1,1,1,1,2,2,2,2),
time2 = c('2016-04-07 21:39:11', '2016-04-05 11:19:18','2016-04-07 21:39:11', '2016-04-05 11:19:18', '2016-04-03 10:58:26', '2016-04-02 21:39:11','2016-04-03 10:58:26', '2016-04-02 21:39:11')
)
df1$time1 = as.POSIXct(df1$time1)
df2$time2 = as.POSIXct(df2$time2)
res <- df1[df2, .(time1, time2), by = .EACHI, on = "id"][, diff:= abs(time2 -time1)]
setkey(res, id, time1, diff)
res <- res[, row := seq_along(.I), by = .(id, time1)][row == 1]
Upvotes: 0
Reputation: 8458
You can also do this in base R. To generate random dates (useful), I borrowed and edited a nice function from elsewhere on StackOverflow:
latemail <- function(N, st="2011/01/01", et="2016/12/31") {
st <- as.POSIXct(as.Date(st))
et <- as.POSIXct(as.Date(et))
dt <- as.numeric(difftime(et,st,unit="sec"))
ev <- sort(runif(N, 0, dt))
return(st + ev)
}
df1 <- data.frame(id=c(1,1,2,2), time1=latemail(4))
df2 <- data.frame(id=c(rep(1,4), rep(2,4)), time2=latemail(8))
And then your answer can be achieved in two lines:
shortest <- sapply(df1$time1, function(x) which(abs(df2$time2 - x) == min(abs(df2$time2 - x))))
cbind(df1, df2[shortest,])
Output:
id time1 id time2
1 2011-10-08 02:00:21 1 2011-08-17 18:07:47
1 2012-05-06 17:49:03 1 2012-09-04 19:52:40
2 2013-10-29 13:14:51 1 2012-10-29 20:09:31
2 2016-06-17 19:23:43 2 2015-11-24 02:07:15
Upvotes: 1
Reputation: 2415
You can achieve this using dplyr
. Basically the idea is since we want to produce an entry we'll assign each element in df1
a new id (in this case I just called it rowname).
After this, all we're interested in is joining the two dataframes on the id
and filter them based on the minimum absolute difference.
library(dplyr)
df1$time1 <- as.POSIXct(as.character(df1$time1))
df2$time2 <- as.POSIXct(as.character(df2$time2))
df1 %>%
add_rownames("rowname") %>%
left_join(df2, "id") %>%
mutate(diff=time2-time1) %>%
group_by(rowname) %>%
filter(min(abs(diff)) == abs(diff)) %>%
distinct
This is my output:
Source: local data frame [4 x 5]
Groups: rowname [4]
rowname id time1 time2 diff
(chr) (dbl) (time) (time) (dfft)
1 1 1 2016-04-07 21:39:10 2016-04-07 21:39:11 1 secs
2 2 1 2016-04-05 11:19:17 2016-04-05 11:19:18 1 secs
3 3 2 2016-04-03 10:58:25 2016-04-03 10:58:26 1 secs
4 4 2 2016-04-02 21:39:10 2016-04-02 21:39:11 1 secs
Upvotes: 2