Laurence_jj
Laurence_jj

Reputation: 726

R data.table two stage lookup

Example data:

set.seed(33)    
a = data.table(id = c(rep(1,5), rep(3,5)),
                  date = rep(seq(Sys.Date()-4, Sys.Date(), by = "day"), 2),
                  value = c(sample(seq(11,43,0.1), 5), sample(seq(201,273,0.1), 5)))
b = data.table(group = c(1, 1, 2),
                   id = c(1, 3,  5))
c = data.table(date=seq(Sys.Date()-4, Sys.Date(), by = "day"),
                   group = 1)

or

>a
    id       date value
 1:  1 2020-08-20  22.8
 2:  1 2020-08-21  12.0
 3:  1 2020-08-22  21.8
 4:  1 2020-08-23  32.6
 5:  1 2020-08-24  24.2
 6:  3 2020-08-20 242.2
 7:  3 2020-08-21 257.8
 8:  3 2020-08-22 255.0
 9:  3 2020-08-23 269.0
10:  3 2020-08-24 263.4

> b
   group id
1:     1  1
2:     1  3
3:     2  5

> c
         date group
1: 2020-08-20     1
2: 2020-08-21     1
3: 2020-08-22     1
4: 2020-08-23     1
5: 2020-08-24     1

I would like to join table a to table b and the temporary result to table c.

Another interpretation is that for each group in table c I would like to sum all the daily values of companies (id's) in the corresponding table b using the daily prices in table a.

e.g.

table c row 1 is group 1 on the 20th August 2020, so this corresponds to company id's 1 and 3 in table b, which have corresponding values of 22.8 and 242.2.

So the value for table c row 1 should be 22.8+242.2 = 265.0.

Upvotes: 0

Views: 49

Answers (1)

s_baldur
s_baldur

Reputation: 33488

Here is one option:

a[, group := b[.SD, on = "id", group]]
c <- a[c, on = c("group", "date"), sum(value), by = .EACHI]
#    group       date    V1
# 1:     1 2020-08-20 272.5
# 2:     1 2020-08-21 261.1
# 3:     1 2020-08-22 312.2
# 4:     1 2020-08-23 254.0
# 5:     1 2020-08-24 227.4

Without modifying a:

c <- b[a, on = "id"][c, on = c("group", "date"), sum(value), by = .EACHI]

Updating c only by reference:

a[, group := b[.SD, on = "id", group]]
c[, V1 := a[.SD, on = c("group", "date"), sum(value), by = .EACHI]$V1]

Reproducible data (although not really because of the use of Sys.Date()):

set.seed(33)
a = data.table(id = c(rep(1,5), rep(3,5)),
              date = rep(seq(Sys.Date()-4, Sys.Date(), by = "day"), 2),
              value = c(sample(seq(11,43,0.1), 5), sample(seq(201,273,0.1), 5)))
b = data.table(group = c(1, 1, 2),
               id = c(1, 3,  5))
c = data.table(date=seq(Sys.Date()-4, Sys.Date(), by = "day"),
               group = 1)

Upvotes: 2

Related Questions