jf328
jf328

Reputation: 7351

data.table rolling join for each group

How can I join two data table with rolling join for each group?

library(data.table)
alldates = as.Date(c('2000-01-01','2005-01-01','2010-01-01','2015-01-01','2020-01-01'))
gdp = data.table(date = alldates[c(1,3,5,1,3,5)], country = c('A','A','A','B','B','B'), value = c(1,10,100, 2, 20, 200))
gdp
   date country value
1: 2000-01-01       A     1
2: 2010-01-01       A    10
3: 2020-01-01       A   100
4: 2000-01-01       B     2
5: 2010-01-01       B    20
6: 2020-01-01       B   200

price = data.table(date = alldates, price = c(101, 102, 103, 104, 105))
price
   date price
1: 2000-01-01   101
2: 2005-01-01   102 # gdp table is missing mid decade data
3: 2010-01-01   103
4: 2015-01-01   104
5: 2020-01-01   105

result I want

         date country value price
1: 2000-01-01       A     1   101
2: 2000-01-01       B     2   101
3: 2005-01-01       A     1   102 # fill in value using previous gdp for each country
4: 2005-01-01       B     2   102
5: 2010-01-01       A    10   103
6: 2010-01-01       B    20   103
7: 2015-01-01       A    10   104
8: 2015-01-01       B    20   104
9: 2020-01-01       A   100   105
10: 2020-01-01      B   200   105

NB

  1. Row order doesn't matter
  2. Doesn't need to be one-liner
  3. gdp[price, on = 'date', roll = TRUE] doesn't work

Upvotes: 2

Views: 506

Answers (1)

Frank
Frank

Reputation: 66819

After cleaning the data...

# fill in missing levels
gdpf = gdp[CJ(date = price$date, country = country, unique = TRUE), on=.(date, country)]

# fill in values for missing levels
gdpf[order(country), value := first(value), by=.(country, cumsum(!is.na(value)))]

An update join can then grab the prices:

gdpf[price, on=.(date), price := i.price ]

          date country value price
 1: 2000-01-01       A     1   101
 2: 2000-01-01       B     2   101
 3: 2005-01-01       A     1   102
 4: 2005-01-01       B     2   102
 5: 2010-01-01       A    10   103
 6: 2010-01-01       B    20   103
 7: 2015-01-01       A    10   104
 8: 2015-01-01       B    20   104
 9: 2020-01-01       A   100   105
10: 2020-01-01       B   200   105

Another way to fill down values for missing levels is value := na.locf(value), by=country using the zoo package.

Upvotes: 4

Related Questions