antonio
antonio

Reputation: 11150

Using zero-length xts objects

I am confused by zero-length xts objects with a non-zero width, here for convenience addressed as empty xts objects.
I think they are a good way to model securities for which there are no observations, e.g. delisted securities.

x=xts(matrix(numeric(0), dimnames=list(NULL, "Delist1")), as.Date(numeric(0)))
x
#      Delist1

However, when it comes to merge empty securities, they simply disappear:

y=xts(1:3,  as.Date(1:3))    
names(y)="List1"
merge(x,y)
#             List1
# 1970-01-02      1
# 1970-01-03      2
# 1970-01-04      3

This is particularly inconvenient when you have several time series, some of which might be empty:

z=xts(matrix(numeric(0), dimnames=list(NULL, "Delist2")), as.Date(numeric(0)))
L=list(x,y,z) # etc.
Reduce(merge, L)
#             List1
# 1970-01-02      1
# 1970-01-03      2
# 1970-01-04      3

You lose the information about empty time series, while you would like to have a full column of NAs for each empty series.

Perhaps the simple rule of the thumb is simply not to use empty xts objects and use NAs:

x=c(Delist1=NA)
z=c(Delist2=NA)
L=list(x,y,z) # etc.
X=Reduce(merge.xts, L)
setNames(X, sapply(L, names))
#            Delist1  List1 Delist2
# 1970-01-02      NA      1      NA
# 1970-01-03      NA      2      NA
# 1970-01-04      NA      3      NA

However, the first two series can't be NAs:

L=list(x,z,y) # etc.
Reduce(merge.xts, L)
# Error use:
do.call(merge.xts, L)

Anyway, if all elements of L are empty, do.call(...) still does not work and extra fixes are required.

Summing up:

  1. How should empty xts objects be used?
  2. Are they intended to model time series without observations?

UPDATE

This is a long comment to the solution proposed by @userR

In my fictional time window as.Date(1:3), it would be simple to create/define an empty time series with a NA for each date and this might work for theoretical cases.
In real world scenarios, when you query data along a given time window, you never get all days filled.

To clarify, say you query data for securities A and B over the period 2000-02-01/2000-02-04.
The NA-based empty series should be like:

(delist=xts(rep(NA,4), as.Date("2000-02-01")+0:3))
#            [,1]
# 2000-02-01   NA
# 2000-02-02   NA
# 2000-02-03   NA
# 2000-02-04   NA

Actual, data returned by provider, might be like:

A
#            [,1]
# 2000-02-01    1
# 2000-02-02    2
# 2000-02-03    3

B
#            [,1]
# 2000-02-01    1
# 2000-02-02   NA
# 2000-02-03    3

Assume there is no 2000-02-04 date because this is not a trading day.
Merging gives:

merge(A,B, delist)
#             A  B delist
# 2000-02-01  1  1     NA
# 2000-02-02  2 NA     NA
# 2000-02-03  3  3     NA
# 2000-02-04 NA NA     NA

Clearly the NA in [2,2] is an actual missing value, the last row of NAs is artificially induced by the definition of delist.
Instead, the single NA approach does not involve these problems:

delist=NA
merge(A,B, delist)
#            A  B delist
# 2000-02-01 1  1     NA
# 2000-02-02 2 NA     NA
# 2000-02-03 3  3     NA

We cannot know exactly in advance the trading dates returned. Different markets/exchanges/securities implement different conventions, so identifying trading days before querying for them would be at least impractical, if at all possible.

UPDATE: Persistence

There is also a more subtle issue.
Above, securities A and B have a persistent representation based on actual observations. When the context requires it, for example in a merge(), NAs are added to fill misalignment gaps. Defining, as above, a third security C based on the dates of A and B makes for a weak definition because, when you change the portfolio mix, you are also changing the definition of this security, but C is always the same security for which the data source returned no data.
So, IMHO, C can be modelled as a NA, a NULL, a zero-length xts, etc., but not as a context sensitive value.

Upvotes: 3

Views: 1501

Answers (2)

FXQuantTrader
FXQuantTrader

Reputation: 6891

1. How should empty xts objects be used?

2. Are they intended to model time series without observations?

Answer to both:

Empty xts objects often are the result of subsetting over a time range that is outside the range of values in your xts object. What would you expect getSymbols("AAPL"); AAPL["1900"] to return?

When is an empty xts object useful then? Well, I would create an empty xts object (no actual data) with a set of dates if I wanted to pad in rows of NA data, which might later be used for other purposes. For example, if we have bid/ask values on uneven time stamps, we might want to create 5 second OHLC bars, with neat start of bar timestamps (NB in reality, if you are merging bar data at different frequencies, always be sure to use end of bar timestamps to avoid accidentally introducing lookforward bias). An empty xts object helps reach the goal:

# Create sample tick data:
set.seed(5)
st_time <- as.POSIXct('2007-01-02 09:00:00')
x <- xts(x= matrix(c(1:10, 1:10 +0.01),nc = 2) + rnorm(10, 0, 0.01),
         order.by = st_time + rnorm(10,0, 3) + seq(5, 25, length.out = 10),
         dimnames = list(NULL, c("bid","ask")))
x
#                                   bid       ask
# 2007-01-02 09:00:04.816884  2.0138436  2.023844
# 2007-01-02 09:00:06.203266  2.9874451  2.997445
# 2007-01-02 09:00:08.682891  0.9915914  1.001591
# 2007-01-02 09:00:10.673608  5.0171144  5.027114
# 2007-01-02 09:00:11.194063  4.0007014  4.010701
# 2007-01-02 09:00:14.003655  7.9936463  8.003646
# 2007-01-02 09:00:15.694152  5.9939709  6.003971
# 2007-01-02 09:00:16.541393  6.9952783  7.005278
# 2007-01-02 09:00:23.500229  8.9971423  9.007142
# 2007-01-02 09:00:24.221933 10.0013811 10.011381

to.period(x, period = "secs", k = 5, indexAt='startof')
#                              x.Open    x.High     x.Low    x.Close
# 2007-01-02 09:00:04.816884 2.013844  2.013844 2.0138436  2.0138436
# 2007-01-02 09:00:06.203266 2.987445  2.987445 0.9915914  0.9915914
# 2007-01-02 09:00:10.673608 5.017114  7.993646 4.0007014  7.9936463
# 2007-01-02 09:00:15.694152 5.993971  6.995278 5.9939709  6.9952783
# 2007-01-02 09:00:23.500229 8.997142 10.001381 8.9971423 10.0013811

# 5 sec bar Timestamps are messy, so let's fix them using an empty xts object
# Use of an empty xts object:
emp_5sec_interval <- xts(order.by = st_time + seq(5, 25, by = 5),)
x2 <- merge(x, emp_5sec_interval, fill = na.locf)
x2
#                                   bid       ask
# 2007-01-02 09:00:04.816884  2.0138436  2.023844
# 2007-01-02 09:00:05.000000  2.0138436  2.023844
# 2007-01-02 09:00:06.203266  2.9874451  2.997445
# 2007-01-02 09:00:08.682891  0.9915914  1.001591
# 2007-01-02 09:00:10.000000  0.9915914  1.001591
# 2007-01-02 09:00:10.673608  5.0171144  5.027114
# 2007-01-02 09:00:11.194063  4.0007014  4.010701
# 2007-01-02 09:00:14.003655  7.9936463  8.003646
# 2007-01-02 09:00:15.000000  7.9936463  8.003646
# 2007-01-02 09:00:15.694152  5.9939709  6.003971
# 2007-01-02 09:00:16.541393  6.9952783  7.005278
# 2007-01-02 09:00:20.000000  6.9952783  7.005278
# 2007-01-02 09:00:23.500229  8.9971423  9.007142
# 2007-01-02 09:00:24.221933 10.0013811 10.011381
# 2007-01-02 09:00:25.000000 10.0013811 10.011381

x_ohlc <- to.period(x2, period = "secs", k = 5, indexAt='startof')
x_ohlc
#                               x2.Open   x2.High     x2.Low   x2.Close
# 2007-01-02 09:00:04.816884  2.0138436  2.013844  2.0138436  2.0138436
# 2007-01-02 09:00:05.000000  2.0138436  2.987445  0.9915914  0.9915914
# 2007-01-02 09:00:10.000000  0.9915914  7.993646  0.9915914  7.9936463
# 2007-01-02 09:00:15.000000  7.9936463  7.993646  5.9939709  6.9952783
# 2007-01-02 09:00:20.000000  6.9952783 10.001381  6.9952783 10.0013811
# 2007-01-02 09:00:25.000000 10.0013811 10.001381 10.0013811 10.0013811

If you want to model delisted securities, then create an xts object with valid dates, and fill the corresponding columns (likely more than one column in reality, such as OHLC, Bid/Ask) with NAs. e.g. merge delisted securities using something like (As useR already suggests)

y=xts(rep(NA, 3),  as.Date(1:3))

I wouldn't use the delist=NA approach you suggest.

Ultimately the xts authors are the authority here on design in relation to what an empty xts object is ...

Upvotes: 1

acylam
acylam

Reputation: 18691

I think the issue here is how you define an "empty time series". The way you defined your "empty series" is that it has zero rows. Your first merge ignores x because it does not know how to work with zero length series. merge() excludes your empty series because there are no common column to merge by. In fact, your delisted series should have 3 rows with NAs for each row. This way you can compare them with xts objects of the same number of rows.

library(xts)

x=xts(matrix(rep(NA, 3), dimnames=list(NULL, "Delist1")), as.Date(1:3))

y=xts(1:3,  as.Date(1:3))    
names(y)="List1"
z = merge(x,y)

# > z
#            Delist1 List1
# 1970-01-02      NA     1
# 1970-01-03      NA     2
# 1970-01-04      NA     3

class(z)

# > class(z)
# [1] "xts" "zoo"

Upvotes: 0

Related Questions