Reputation: 443
I have a data.table A:
A = data.table(firmID = c("89031", "89031", "89031", "89031", "89031", "89031"), date = c("20170403","20170404", "20170405", "20170406", "20170407", "20170408"), TICKER = c("TSAR", "TSAR", "TSAR", "AAXN", "AAXN", "AAXN" ) )
Here it contains column "firmID", "date", "TICKER".
Now, firmID tracks the same company even their TICKER changed. Here we see TICKER changes from AAXN to TSAR starting from 20170426.
I want to make a table B like this:
B = data.table(firmID = c("89031", "89031"), STARTdate = c("20170403", "20170406"), ENDdate = c("20170405", "20170408"), TICKER = c("TSAR", "AAXN"))
So that it has a time interval when the TICKER is called respectively. This table can then be used to merge with another dataset which does not have firmID using foverlaps() function.
Any idea how one can do this without writing a for loop and compare each record to the previous one etc?
Thanks.
Upvotes: 0
Views: 64
Reputation: 953
Given your A
data.table, getting minimum and maximum of date
would give each STARTdate
and ENDdate
.
If you group(by
) both TICKER
and firmID
, you might get the result you want.
library(data.table)
A[,
.(STARTdate = min(date), ENDdate = max(date)), # range of date
by = .(TICKER, firmID)]
#> TICKER firmID STARTdate ENDdate
#> 1: TSAR 89031 20170403 20170405
#> 2: AAXN 89031 20170406 20170408
Upvotes: 1