eun lee
eun lee

Reputation: 145

How to use which.min from another table by row

I try to get data from another data.frame which has minimum value.

There are two kinds of the datasets which are card and log.

card

#    id      BSTN    ASTN    BSEC    ASEC
#    201557  151     150     60633   61302
#    201558  151     150     60159   60680
#    201559  151     150     44757   45149
#    201560  151     150     77551   77923
#    201561  151     150     61160   61606

log

    TRNID   ASTN    TIME
#    1   150     45140
#    2   150     61300
#    3   150     61600
#    4   150     68570
#    5   150     77900
#    6   150     79125
#    7   150     82477
#    8   150     82767

What I want to is that get the log$TRNID and log$TIMElog$TIME data to the card data with following function log[which.min(card$ASEC-log$TIME)]

With the for loop, it takes too long time since it calculates row by row. Is there any calculation without for loop?

The final result of data.frame would be like this:

#    id      BSTN    ASTN    BSEC    ASEC    TRNID   TIME
#    201557  151     150     60633   61302   2       61300
#    201558  151     150     60159   60680   2       61300
#    201559  151     150     44757   45149   1       45140
#    201560  151     150     77551   77923   5       77900
#    201561  151     150     61160   61606   3       61600

Upvotes: 0

Views: 57

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

One way using base R could be, for each ASEC we find the index of minimum value TIME entry in log and using that index return the corresponding TRNID and TIME value and add it to original dataframe of card.

card[c("TRNID", "TIME")] <- do.call("rbind", lapply(card$ASEC, function(x) {
               inds <- log$TIME - x 
               idx <- which(inds %in% max(inds[inds < 0]))
               c(log$TRNID[idx], log$TIME[idx])
}))

card

#      id BSTN ASTN  BSEC  ASEC TRNID  TIME
#1 201557  151  150 60633 61302     2 61300
#2 201558  151  150 60159 60680     1 45140
#3 201559  151  150 44757 45149     1 45140
#4 201560  151  150 77551 77923     5 77900
#5 201561  151  150 61160 61606     3 61600

Upvotes: 1

mt1022
mt1022

Reputation: 17289

Another way is rolling join:

library(data.table)
setDT(log)
setDT(card)

log[, ASEC := TIME]
res <- log[card, roll = -Inf, on = .(ASTN, ASEC)]

# >res
#
#    TRNID ASTN  TIME  ASEC     id BSTN  BSEC
# 1:     2  150 61350 61302 201557  151 60633
# 2:     2  150 61350 60680 201558  151 60159
# 3:     1  150 46140 45149 201559  151 44757
# 4:     5  150 77950 77923 201560  151 77551
# 5:     3  150 61650 61606 201561  151 61160

Rolling join will find an interval in log for every value of last join column (ASEC) in card. -Inf means the next observation of log will be used to match the value in card.

Upvotes: 2

r2evans
r2evans

Reputation: 160447

Since you need to compare the (absolute value) difference between all card$ASEC and log$TIME, I think an outer call is most useful here:

outer(card$ASEC, log$TIME, `-`)
#       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
# [1,] 15162    -48   -348  -7268 -16648 -17823 -21175 -21465
# [2,] 14540   -670   -970  -7890 -17270 -18445 -21797 -22087
# [3,]  -991 -16201 -16501 -23421 -32801 -33976 -37328 -37618
# [4,] 31783  16573  16273   9353    -27  -1202  -4554  -4844
# [5,] 15466    256    -44  -6964 -16344 -17519 -20871 -21161

We can quickly reduce this (per row) to find the minimum absolutely value:

( ind <- apply(abs(outer(card$ASEC, log$TIME, `-`)), 1, which.min) )
# [1] 2 2 1 5 3

cbind.data.frame(card, log[ind,], stringsAsFactors=FALSE)
#         id BSTN ASTN  BSEC  ASEC TRNID ASTN  TIME
# 2   201557  151  150 60633 61302     2  150 61350
# 2.1 201558  151  150 60159 60680     2  150 61350
# 1   201559  151  150 44757 45149     1  150 46140
# 5   201560  151  150 77551 77923     5  150 77950
# 3   201561  151  150 61160 61606     3  150 61650

Upvotes: 1

Related Questions