sdittmar
sdittmar

Reputation: 365

Is rbind error of xts time series a bug or a feature

I have two hourly xts time series that I subset to daily periodicity with apply.daily. In the same call I select/subset one column form the data. When I than combine the two daily time series with rbind, I receive an error.

I already found a solution, but I was curious whether the behavior is expected or not.

Here is some code to reproduce the error in R version 3.5.2 (Linux Debian) and xts_0.11-2:

data1 <- xts(matrix(1:144, ncol = 2), as.POSIXct("2019-05-09 00:00:00") -
          seq.int(60*60, by = 60*60, length.out = 72))
data2 <- xts(matrix(1:144, ncol = 2), as.POSIXct("2019-05-05 00:00:00") -
          seq.int(60*60, by = 60*60, length.out = 72))

colnames(data1) <- c("col1", "col2")
colnames(data2) <- c("col1", "col2")

data1.daily <- apply.daily(data1[,"col1"], colSums)
data2.daily <- apply.daily(data2[,"col1"], colSums)

data.daily <- rbind(data1.daily, data2.daily)

Causes the following error:

Error in rbind(deparse.level, ...) : length of 'dimnames' [1]
 not equal to array extent

The main culprit is the first attribute line chr [1:3] "col1" "col1" "col1" which looks odd to me:

str(data1.daily)

An ‘xts’ object on 2019-05-06 23:00:00/2019-05-08 23:00:00 containing:
  Data: num [1:3, 1] 1452 876 300
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:3] "col1" "col1" "col1"
  ..$ : chr "col1"
  Indexed by objects of class: [POSIXct,POSIXt] TZ: 
  xts Attributes:  
 NULL

I can easily solve the problem by reversing the steps:

data <- rbind(data1, data2)
data.daily <- apply.daily(data[,"col1"], colSums)

But I would prefer to store the data once it has a lower frequency.

So the question is not how to solve the problem, but whether this may be a bug or some sub-setting feature for another purpose.

Upvotes: 2

Views: 123

Answers (2)

AkselA
AkselA

Reputation: 8846

I'm not entirely sure what's happening, but removing rownames from the first xts object seems to fix it.

rownames(data1.daily) <- NULL
rbind(data1.daily, data2.daily)
#                     col1
# 2019-05-02 23:00:00 1452
# 2019-05-03 23:00:00  876
# 2019-05-04 23:00:00  300
# 2019-05-06 23:00:00 1452
# 2019-05-07 23:00:00  876
# 2019-05-08 23:00:00  300

Right. Use sum(), not colSums() in apply.daily().

data1.daily <- apply.daily(data1[,"col1"], sum)
data2.daily <- apply.daily(data2[,"col1"], sum)
rbind(data1.daily, data2.daily)
#                     col1
# 2019-05-02 23:00:00 1452
# 2019-05-03 23:00:00  876
# 2019-05-04 23:00:00  300
# 2019-05-06 23:00:00 1452
# 2019-05-07 23:00:00  876
# 2019-05-08 23:00:00  300

The error appears to happen in apply.daily (or really period.apply()), when the underlying sapply() call returns a named vector. These names later ends up as row names. I wouldn't call this a bug, as using colSums() in this setting doesn't make much sense. It should be quite easy, though, to make the function more resilient to errors like this, if that was wanted, but that's up to Joshua.

Upvotes: 2

jay.sf
jay.sf

Reputation: 73164

You could write a function for this purpose. I'm not sure, though, if you need the data as data.frame or something else. Anyway, I'll provide the former since it was quite a challenge.

res <- do.call(rbind, lapply(list(data1.daily, data2.daily), function(x) {
  t <- as.POSIXct(attr(x, "index"), 
                  origin="1970-01-01")
  value <- as.numeric(x$col1)
  return(data.frame(t, value))
}))
res
#                     t value
# 1 2019-05-06 23:00:00  1452
# 2 2019-05-07 23:00:00   876
# 3 2019-05-08 23:00:00   300
# 4 2019-05-02 23:00:00  1452
# 5 2019-05-03 23:00:00   876
# 6 2019-05-04 23:00:00   300

class(res)
# [1] "data.frame"

Upvotes: 0

Related Questions