Reputation: 1783
I am trying to join two tables in R using dplyr, in this way:
Table 1:
ID CHAR1 CHAR2
01 xyz abc
02 abc xyz
etc...
Table 2:
ID YEAR VALUE1 STATUS
01 2012 455 T
01 2013 14 B
01 2014 234 C
01 2015 17 T
02 2014 213 B
02 2015 456 B
02 2016 17 B
I have a separate variable called MODEL_YEAR, which in this example is set to 2015.
I want my new table to look like this:
ID CHAR1 CHAR2 VALUE_Tminus1 VALUE_Tminus2 STATUS_Tminus1 STATUS_Tminus2
01 xyz abc 234 14 C B
02 abc xyz 213 NA B NA
The new columns go back in time from the MODEL_YEAR. For example, VALUE_Tminus1 takes VALUE1 from 2014. I would like the new table to go back 10 years, with NA occurring where no data is available.
So far I have tried to do this my adding MODEL_YEAR as a column in Table 1, and then doing a left join Table 1 to Table2, like this:
left_join(Table_1, Table_2, by=c("ID"="ID", "MODEL_YEAR"=("YEAR"-1))
However this doesn't work, as I can't subtract 1 from "YEAR" in this way. I think I'd possibly be able to do this by adding lots of new columns and doing multiple joins, but am not sure and wondered if there's a neater way? Perhaps using data.table - I know this can be used for joins, but I'm unfamiliar with it.
Many thanks
Upvotes: 0
Views: 1046
Reputation: 33498
Since it has the tag, here is a data.table
solution:
library(data.table)
library(magrittr) # For readability only
MODEL_YEAR <- 2015L
tab1 <- fread(
'ID CHAR1 CHAR2
01 xyz abc
02 abc xyz',
colClasses = 'character'
)
tab2 <- fread(
'ID YEAR VALUE1 STATUS
01 2012 455 T
01 2013 14 B
01 2014 234 C
01 2015 17 T
02 2014 213 B
02 2015 456 B
02 2016 17 B',
colClasses = c('character', 'integer', 'integer', 'character')
)
setkey(tab1, ID)
setkey(tab2, ID, YEAR)
tab2[CJ(ID, YEAR = seq(MODEL_YEAR - 1, MODEL_YEAR - 5), unique = TRUE)] %>%
.[, YEAR := paste0('Tminus', MODEL_YEAR - YEAR)] %>%
dcast(ID ~ YEAR, value.var = c('VALUE1', 'STATUS')) %>%
tab1[.]
# ID CHAR1 CHAR2 VALUE1_Tminus1 VALUE1_Tminus2 VALUE1_Tminus3 VALUE1_Tminus4 VALUE1_Tminus5 STATUS_Tminus1 STATUS_Tminus2 STATUS_Tminus3 STATUS_Tminus4 STATUS_Tminus5
# 1: 01 xyz abc 234 14 455 NA NA C B T <NA> <NA>
# 2: 02 abc xyz 213 NA NA NA NA B <NA> <NA> <NA> <NA>
To get more lags just change the number 5 to anything you want.
Upvotes: 2