Gabriel
Gabriel

Reputation: 443

Finding Interval in data.table

I have a data.table A:

A = data.table(firmID = c("89031", "89031", "89031", "89031", "89031", "89031"), date = c("20170403","20170404", "20170405", "20170406", "20170407", "20170408"), TICKER = c("TSAR", "TSAR", "TSAR", "AAXN", "AAXN", "AAXN" ) )

Here it contains column "firmID", "date", "TICKER".

Now, firmID tracks the same company even their TICKER changed. Here we see TICKER changes from AAXN to TSAR starting from 20170426.

I want to make a table B like this:

B = data.table(firmID = c("89031", "89031"), STARTdate = c("20170403", "20170406"), ENDdate = c("20170405", "20170408"), TICKER = c("TSAR", "AAXN"))

So that it has a time interval when the TICKER is called respectively. This table can then be used to merge with another dataset which does not have firmID using foverlaps() function.

Any idea how one can do this without writing a for loop and compare each record to the previous one etc?

Thanks.

Upvotes: 0

Views: 64

Answers (1)

younggeun
younggeun

Reputation: 953

Given your A data.table, getting minimum and maximum of date would give each STARTdate and ENDdate.

If you group(by) both TICKER and firmID, you might get the result you want.

library(data.table)
A[,
  .(STARTdate = min(date), ENDdate = max(date)), # range of date
  by = .(TICKER, firmID)]

#>    TICKER firmID STARTdate  ENDdate
#> 1:   TSAR  89031  20170403 20170405
#> 2:   AAXN  89031  20170406 20170408

Upvotes: 1

Related Questions