Sinnombre
Sinnombre

Reputation: 438

R data.table updating a table based on an inner join with another table

I need to update a subset of a table based on information in another table:

main_dt = data.table(ID = 1:3, flag=c(TRUE,TRUE,FALSE), treatment_id = c(1,2,1), total_cost = 0)

costs_dt = data.table(treatment_id = 1:3, treatment_cost = c(3,5,8))

main_dt

      ID   flag treatment_id total_cost
   <int> <lgcl>        <num>      <num>
1:     1   TRUE            1          0
2:     2   TRUE            2          0
3:     3  FALSE            1          0

cost_dt

   treatment_id treatment_cost
          <int>          <num>
1:            1              3
2:            2              5
3:            3              8

I would like to end up with:

> main_dt
      ID   flag treatment_id total_cost
   <int> <lgcl>        <num>      <num>
1:     1   TRUE            1          3
2:     2   TRUE            2          5
3:     3  FALSE            1          0

I've been trying to do something like:

main_dt[flag==TRUE,][costs_dt, total_cost := total_cost + treatment_cost, on="treatment_ID", nomatch=NULL]

But this is getting an error because it has both a := assignment and a nomatch. If I remove the nomatch it still doesn't seem to work. Obviously I could join without the flag==TRUE, but then the row I don't want updated will be. Also, if I don't do an inner join I end up with a row with a bunch of NAs for treatment_id = 3.

I feel like this should be totally trivial but I am struggling to figure it out...

Upvotes: 3

Views: 60

Answers (2)

thelatemail
thelatemail

Reputation: 93938

If you add a flag = TRUE to the costs_dt dataset, and then include it in the join, you will only update the matching rows.

main_dt[costs_dt[, c(.SD, flag=TRUE)], on=.(treatment_id, flag),
        total_cost := total_cost + i.treatment_cost]
main_dt
#      ID   flag treatment_id total_cost
#   <int> <lgcl>        <num>      <num>
#1:     1   TRUE            1          3
#2:     2   TRUE            2          5
#3:     3  FALSE            1          0

Upvotes: 2

jay.sf
jay.sf

Reputation: 73562

You could use fifelse.

> library(data.table)
> main_dt[costs_dt, on = "treatment_id", total_cost := fifelse(flag, total_cost + treatment_cost, total_cost)]
> main_dt
      ID   flag treatment_id total_cost
   <int> <lgcl>        <num>      <num>
1:     1   TRUE            1          3
2:     2   TRUE            2          5
3:     3  FALSE            1          0

Upvotes: 2

Related Questions