hannes101
hannes101

Reputation: 2528

Correct way to specify quarterly observations as the time index in the plm package

I am trying to convert quarterly data which is stored in a data.table into a panel data.frame to prepare it for further analysis. But apparently there's an issue when using quarterly dates as time dimension. I can convert them to date, numeric or character, but it is not recognised as quarterly time series by is.pconsecutive(), which then prevents me from using certain functions.

library(zoo)
library(data.table)
dt <- structure(list(Global.Company.Key = c(1380L, 1380L, 1380L, 1380L, 
1380L, 1380L, 1380L, 1380L), Calendar.Data.Year.and.Quarter = structure(c(2000, 
2000.25, 2000.5, 2000.75, 2001, 2001.25, 2001.5, 2001.75), class = "yearqtr"), 
    Calendar.Year.Quarter.Integer = c(10957L, 11048L, 11139L, 
    11231L, 11323L, 11413L, 11504L, 11596L), Year.Date = structure(c(10957, 
    11048, 11139, 11231, 11323, 11413, 11504, 11596), class = "Date")), .Names = c("Global.Company.Key", 
"Calendar.Data.Year.and.Quarter", "Calendar.Year.Quarter.Integer", 
"Year.Date"), row.names = c(NA, -8L), class = c("data.table", 
"data.frame"))
# defined the date index as integer
pdt <- pdata.frame(dt, index = c("Global.Company.Key", "Calendar.Year.Quarter.Integer"))
is.pconsecutive(pdt)
 1380 
 FALSE 

Apparently the time dimension is analysed by checking if the distance between the data points is regularly spaced and one. From the manual: "For evaluation of consecutiveness, the time dimension is interpreted to be numeric, and the data are tested for being a regularly spaced sequence with distance 1 between the time periods for each individual (for each individual the time dimension can be interpreted as sequence t, t+1, t+2, ... where t is an integer)." So what is the best and most robust way to convert the year quarter time series?

Upvotes: 0

Views: 664

Answers (2)

Helix123
Helix123

Reputation: 3687

pdata.frame is not aware of quarterly data and not aware of the facilities packages like zoo provide. The variables serving as the index are coerced to a factor variable.

By analysing what is.pconsecutive does: You would need a time variable as an index that is a "meaningful" integer series after coercing the factor to character first and then to numeric (this is what is.pconsecutive does).

For your example you want an index that gives a regular sequence for this: as.numeric(as.character(index(pdt)[[2]])).

For the data in your question you get:

[1] 10957 11048 11139 11231 11323 11413 11504 11596, which is not evaluated as consecutive.

For the data in your answer you get this:

[1] 1 2 3 4 5 6 7 8, which is evaluated as being consecutive.

Upvotes: 2

hannes101
hannes101

Reputation: 2528

I came up with a solution to the problem, which is sufficient for this purpose and is only applicable to this particular dataset, since it needs adjusting if a different time horizon is covered. I basically convert all quarters relative to the first quarter in the dataset and then just calculate integers for each quarter and use this as the time index.

library(lubridate)
dt[, Time.Index := (year(Calendar.Data.Year.and.Quarter)-2000)*4+quarter(Calendar.Data.Year.and.Quarter)]
pdt <- pdata.frame(dt , index = c("Global.Company.Key", "Time.Index"))
is.pconsecutive(pdt) # <- this then reports TRUE

It is a workaround, but not so bad I think.

Upvotes: 0

Related Questions