Reputation: 2528
I am trying to convert quarterly data which is stored in a data.table
into a panel data.frame to prepare it for further analysis. But apparently there's an issue when using quarterly dates as time dimension.
I can convert them to date, numeric or character, but it is not recognised as quarterly time series by is.pconsecutive()
, which then prevents me from using certain functions.
library(zoo)
library(data.table)
dt <- structure(list(Global.Company.Key = c(1380L, 1380L, 1380L, 1380L,
1380L, 1380L, 1380L, 1380L), Calendar.Data.Year.and.Quarter = structure(c(2000,
2000.25, 2000.5, 2000.75, 2001, 2001.25, 2001.5, 2001.75), class = "yearqtr"),
Calendar.Year.Quarter.Integer = c(10957L, 11048L, 11139L,
11231L, 11323L, 11413L, 11504L, 11596L), Year.Date = structure(c(10957,
11048, 11139, 11231, 11323, 11413, 11504, 11596), class = "Date")), .Names = c("Global.Company.Key",
"Calendar.Data.Year.and.Quarter", "Calendar.Year.Quarter.Integer",
"Year.Date"), row.names = c(NA, -8L), class = c("data.table",
"data.frame"))
# defined the date index as integer
pdt <- pdata.frame(dt, index = c("Global.Company.Key", "Calendar.Year.Quarter.Integer"))
is.pconsecutive(pdt)
1380
FALSE
Apparently the time dimension is analysed by checking if the distance between the data points is regularly spaced and one. From the manual: "For evaluation of consecutiveness, the time dimension is interpreted to be numeric, and the data are tested for being a regularly spaced sequence with distance 1 between the time periods for each individual (for each individual the time dimension can be interpreted as sequence t, t+1, t+2, ... where t is an integer)." So what is the best and most robust way to convert the year quarter time series?
Upvotes: 0
Views: 664
Reputation: 3687
pdata.frame
is not aware of quarterly data and not aware of the facilities packages like zoo
provide. The variables serving as the index are coerced to a factor variable.
By analysing what is.pconsecutive
does: You would need a time variable as an index that is a "meaningful" integer series after coercing the factor to character first and then to numeric (this is what is.pconsecutive
does).
For your example you want an index that gives a regular sequence for this:
as.numeric(as.character(index(pdt)[[2]]))
.
For the data in your question you get:
[1] 10957 11048 11139 11231 11323 11413 11504 11596
, which is not evaluated as consecutive.
For the data in your answer you get this:
[1] 1 2 3 4 5 6 7 8
, which is evaluated as being consecutive.
Upvotes: 2
Reputation: 2528
I came up with a solution to the problem, which is sufficient for this purpose and is only applicable to this particular dataset, since it needs adjusting if a different time horizon is covered. I basically convert all quarters relative to the first quarter in the dataset and then just calculate integers for each quarter and use this as the time index.
library(lubridate)
dt[, Time.Index := (year(Calendar.Data.Year.and.Quarter)-2000)*4+quarter(Calendar.Data.Year.and.Quarter)]
pdt <- pdata.frame(dt , index = c("Global.Company.Key", "Time.Index"))
is.pconsecutive(pdt) # <- this then reports TRUE
It is a workaround, but not so bad I think.
Upvotes: 0