Reputation: 7351
How can I join two data table with rolling join for each group?
library(data.table)
alldates = as.Date(c('2000-01-01','2005-01-01','2010-01-01','2015-01-01','2020-01-01'))
gdp = data.table(date = alldates[c(1,3,5,1,3,5)], country = c('A','A','A','B','B','B'), value = c(1,10,100, 2, 20, 200))
gdp
date country value
1: 2000-01-01 A 1
2: 2010-01-01 A 10
3: 2020-01-01 A 100
4: 2000-01-01 B 2
5: 2010-01-01 B 20
6: 2020-01-01 B 200
price = data.table(date = alldates, price = c(101, 102, 103, 104, 105))
price
date price
1: 2000-01-01 101
2: 2005-01-01 102 # gdp table is missing mid decade data
3: 2010-01-01 103
4: 2015-01-01 104
5: 2020-01-01 105
result I want
date country value price
1: 2000-01-01 A 1 101
2: 2000-01-01 B 2 101
3: 2005-01-01 A 1 102 # fill in value using previous gdp for each country
4: 2005-01-01 B 2 102
5: 2010-01-01 A 10 103
6: 2010-01-01 B 20 103
7: 2015-01-01 A 10 104
8: 2015-01-01 B 20 104
9: 2020-01-01 A 100 105
10: 2020-01-01 B 200 105
NB
gdp[price, on = 'date', roll = TRUE]
doesn't workUpvotes: 2
Views: 506
Reputation: 66819
After cleaning the data...
# fill in missing levels
gdpf = gdp[CJ(date = price$date, country = country, unique = TRUE), on=.(date, country)]
# fill in values for missing levels
gdpf[order(country), value := first(value), by=.(country, cumsum(!is.na(value)))]
An update join can then grab the prices:
gdpf[price, on=.(date), price := i.price ]
date country value price
1: 2000-01-01 A 1 101
2: 2000-01-01 B 2 101
3: 2005-01-01 A 1 102
4: 2005-01-01 B 2 102
5: 2010-01-01 A 10 103
6: 2010-01-01 B 20 103
7: 2015-01-01 A 10 104
8: 2015-01-01 B 20 104
9: 2020-01-01 A 100 105
10: 2020-01-01 B 200 105
Another way to fill down values for missing levels is value := na.locf(value), by=country
using the zoo package.
Upvotes: 4