user2506086
user2506086

Reputation: 513

Simultaneous join and update subject to a where clause on two data.tables

I am trying to use the data.table package to update a data.table y depending on how columns in y relate to columns in x, where the two dataframes can be mapped together by an id.

Consider the following two data.tables:

x <- data.table(id=c(1,2,3),status=c(0,1,1),xend=c(2,4,7))
y <- data.table(id=c(1,1,2,2,3,3),yend=c(2,2,3,5,6,8))
setkey(x,id)
setkey(y,id)

Now in y I want to update the column yend when status == 1 and yend > xend so that yend becomes xend. Of course the following notation does not perform the operation, because I am mixing a data.table with an expression, but I thought it would help to show what I would like to do.

y[x[status==1] & yend>xend,yend:=xend]

What is the simplest way to do this?

Upvotes: 4

Views: 199

Answers (1)

Arun
Arun

Reputation: 118859

Using NOCB (next observation carried forward) rolling joins:

# v1.9.6
y[x[status == 1L], yend := i.xend, on=c(id="id", yend="xend"), roll=-Inf]

Also have a look at rollends argument if you want to roll along the edges of each group.

There's no need to setkey() anymore, with the new on= argument implemented that allows for adhoc joins (as subsets).

From the next version, on=c("id", yend="xend") should be sufficient.


Note that this'll only replace the first yend value that is > xend. To replace all values, we'd need to perform a non-equi join, which is not yet implemented, but something we're looking into.

Until then, you can use pmin as suggested by @Frank, but you'll need by=.EACHI:

y[x[status == 1L], yend := pmin(yend, i.xend), on="id", by=.EACHI]

Upvotes: 6

Related Questions