Reputation: 145
I try to get data from another data.frame which has minimum value.
There are two kinds of the datasets which are card and log.
card
# id BSTN ASTN BSEC ASEC
# 201557 151 150 60633 61302
# 201558 151 150 60159 60680
# 201559 151 150 44757 45149
# 201560 151 150 77551 77923
# 201561 151 150 61160 61606
log
TRNID ASTN TIME
# 1 150 45140
# 2 150 61300
# 3 150 61600
# 4 150 68570
# 5 150 77900
# 6 150 79125
# 7 150 82477
# 8 150 82767
What I want to is that get the log$TRNID and log$TIMElog$TIME data to the card data with following function log[which.min(card$ASEC-log$TIME)]
With the for loop, it takes too long time since it calculates row by row. Is there any calculation without for loop?
The final result of data.frame would be like this:
# id BSTN ASTN BSEC ASEC TRNID TIME
# 201557 151 150 60633 61302 2 61300
# 201558 151 150 60159 60680 2 61300
# 201559 151 150 44757 45149 1 45140
# 201560 151 150 77551 77923 5 77900
# 201561 151 150 61160 61606 3 61600
Upvotes: 0
Views: 57
Reputation: 388982
One way using base R could be, for each ASEC
we find the index of minimum value TIME
entry in log
and using that index return the corresponding TRNID
and TIME
value and add it to original dataframe of card
.
card[c("TRNID", "TIME")] <- do.call("rbind", lapply(card$ASEC, function(x) {
inds <- log$TIME - x
idx <- which(inds %in% max(inds[inds < 0]))
c(log$TRNID[idx], log$TIME[idx])
}))
card
# id BSTN ASTN BSEC ASEC TRNID TIME
#1 201557 151 150 60633 61302 2 61300
#2 201558 151 150 60159 60680 1 45140
#3 201559 151 150 44757 45149 1 45140
#4 201560 151 150 77551 77923 5 77900
#5 201561 151 150 61160 61606 3 61600
Upvotes: 1
Reputation: 17289
Another way is rolling join:
library(data.table)
setDT(log)
setDT(card)
log[, ASEC := TIME]
res <- log[card, roll = -Inf, on = .(ASTN, ASEC)]
# >res
#
# TRNID ASTN TIME ASEC id BSTN BSEC
# 1: 2 150 61350 61302 201557 151 60633
# 2: 2 150 61350 60680 201558 151 60159
# 3: 1 150 46140 45149 201559 151 44757
# 4: 5 150 77950 77923 201560 151 77551
# 5: 3 150 61650 61606 201561 151 61160
Rolling join will find an interval in log
for every value of last join column (ASEC
) in card
. -Inf
means the next observation of log
will be used to match the value in card
.
Upvotes: 2
Reputation: 160447
Since you need to compare the (absolute value) difference between all card$ASEC
and log$TIME
, I think an outer
call is most useful here:
outer(card$ASEC, log$TIME, `-`)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 15162 -48 -348 -7268 -16648 -17823 -21175 -21465
# [2,] 14540 -670 -970 -7890 -17270 -18445 -21797 -22087
# [3,] -991 -16201 -16501 -23421 -32801 -33976 -37328 -37618
# [4,] 31783 16573 16273 9353 -27 -1202 -4554 -4844
# [5,] 15466 256 -44 -6964 -16344 -17519 -20871 -21161
We can quickly reduce this (per row) to find the minimum absolutely value:
( ind <- apply(abs(outer(card$ASEC, log$TIME, `-`)), 1, which.min) )
# [1] 2 2 1 5 3
cbind.data.frame(card, log[ind,], stringsAsFactors=FALSE)
# id BSTN ASTN BSEC ASEC TRNID ASTN TIME
# 2 201557 151 150 60633 61302 2 150 61350
# 2.1 201558 151 150 60159 60680 2 150 61350
# 1 201559 151 150 44757 45149 1 150 46140
# 5 201560 151 150 77551 77923 5 150 77950
# 3 201561 151 150 61160 61606 3 150 61650
Upvotes: 1