verybadatthis
verybadatthis

Reputation: 1458

How do I match entries in one data table to another by time series?

I would like to merge two time-series datatables in R, but want the entries of one datatable to be merged to the entry which comes just before it in the prior dataframe.

An example of this would be trying to match location data to purchase data, to find out where people made each purchase. I have created two example datasets to illustrate this. One has an observation for each location change which consists of the date of the location change and the id of the person. One has an observation for each purchase which consists of the date of the purchase and the id of the person. What I want is to merge the two so I know what the most recent location change before each purchase was so I can figured out where purchases occurred.

Create example dataset:

require(data.table)
df.loc = data.frame(date.loc = sample(seq(as.Date("2010-01-01"), by=1, length.out=50),50), ID = rep(letters[seq(1:5)],10))

df.pur = data.frame(date.pur = sample(seq(as.Date("2010-01-01"), by=1, length.out=50),50), ID = rep(letters[seq(1:5)],10))

#convert to a data.table for later analysis
dt.loc = as.data.table(df.loc)
dt.pur = as.data.table(df.pur)

Sample data example:

head(dt.loc)

date.loc ID
2010-01-14  a
2010-01-19  b
2010-01-28  c
2010-02-12  d
2010-01-10  e
2010-02-10  a

Thank you!

Upvotes: 1

Views: 167

Answers (1)

verybadatthis
verybadatthis

Reputation: 1458

This can be done using data.table's rolling join function. Essentially, this will match each observation for an ID in the purchases datatable to the observation in the locations datatable for that ID immediately prior to it. If the first entry in the group is a purchase, then I will assume that the location entry for the ID immediately following it shoul dbe used instead.

#load required packages
require(data.table)
require(zoo)

#add a rolling date columns to the location datatable
dt.loc[, rollDate := date.loc]

#rename the purchase column date to rollDate to allow for a merge
setnames(dt.pur,"date.pur","rollDate")

#set keys, ordering the dataframe by ID and then date
keys_rate = c("ID", "rollDate")
keys_bill = c("ID", "rollDate")
setkeyv(dt.loc, keys_loc)
setkeyv(dt.pur, keys_pur)

#Rolling merge, finding the location observation immediately prior to the purchase observatoin
dt.merge = dt.loc[dt.pur, roll=TRUE]

#Fix the NAs, by assuming that if a purchase observation has no rate observations before it, it uses the location immediately after it
dt.merge[, date.loc := na.locf(date.loc, fromLast=TRUE)]

(Answering this for myself, since the last time I looked there was no explanation for this on stackoverflow, and I could have used it!)

Here is a good explanation of rolling joins in data.table more generally.

Upvotes: 2

Related Questions