banbh
banbh

Reputation: 1525

Dynamically add column to xts object

Adding a column to an xts object is straightforward if you know the name of the column ahead of time. For example, to add a column named "b":

n <- 5
x <- merge(xts(order.by = as.Date('2015-1-1') + 1:n), a = rnorm(n))
x$b <- rnorm(n)

Adding a dynamically-named column (i.e., a column whose name is known only at runtime) is harder:

new.col.name <- 'c' # known only at runtime
x[, new.col.name] <- rnorm(n) # this generates an error

One approach is to add a column with a temporary name and then rename it:

stopifnot(!('tmp' %in% names(x)))
x$tmp <- rnorm(n)
names(x)[names(x) == 'tmp'] <- new.col.name

Is there a better way to do this? (Also, does assigning to names of an xts object result in a copy of the object being made? So, for example, would the above approach work well if n were very large?)

Upvotes: 8

Views: 3566

Answers (2)

Darren Cook
Darren Cook

Reputation: 28928

I believe there is no good alternative, but column names are just an attribute, so are cheap to modify, and no copies will be made. (EDIT: uh-oh, just seen I seem to be saying the opposite to Joshua.--> See discussion in comments. It seems dimnames.xts does more than just set an attribute, and does involve copying the underlying data, so be careful.)

You can also use cbind(), which is a synonym for merge.xts, but (AFAIK) it offers no advantage to the x$b method you showed:

n <- 5
x <- merge(xts(order.by = as.Date('2015-1-1') + 1:n), a = rnorm(n))
x$b <- rnorm(n)
x = cbind(x, c = rnorm(n))
colnames(x)[3] = "real name"

I've also showed one way to change the column name. If you don't know it is the 3rd column, then generic approach is: colnames(x)[length(colnames(x))] = "real name"

Upvotes: 1

Joshua Ulrich
Joshua Ulrich

Reputation: 176648

The easiest/clearest thing to do is merge the original object with the new column(s), after you convert the new column(s) to a matrix (so you can set the column name).

set.seed(21)
newData <- rnorm(n)
x1 <- merge(x, matrix(newData, ncol=1, dimnames=list(NULL, new.col.name)))
# another way to do the same thing
dim(newData) <- c(nrow(x), 1)
colnames(newData) <- new.col.name
x2 <- merge(x, newData)

To answer your second question: yes, assigning names (and colnames) on an xts object creates a copy. You can see it does by using tracemem and the output from gc.

> R -q  # new R session
R> x <- xts::.xts(1:1e6, 1:1e6)
R> tracemem(x)
[1] "<0x2892400>"
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  259260 13.9     592000 31.7   350000 18.7
Vcells 1445207 11.1    4403055 33.6  3445276 26.3
R> colnames(x) <- "hi"
tracemem[0x2892400 -> 0x24c1ad0]: 
tracemem[0x24c1ad0 -> 0x2c62d30]: colnames<- 
tracemem[0x2c62d30 -> 0x3033660]: dimnames<-.xts dimnames<- colnames<- 
tracemem[0x3033660 -> 0x3403f90]: dimnames<-.xts dimnames<- colnames<- 
tracemem[0x3403f90 -> 0x37d48c0]: colnames<- dimnames<-.xts dimnames<- colnames<- 
tracemem[0x37d48c0 -> 0x3033660]: dimnames<-.xts dimnames<- colnames<- 
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  259696 13.9     592000 31.7   350000 18.7
Vcells 1445750 11.1    4403055 33.6  3949359 30.2
R> print(object.size(x), units="Mb")
7.6 Mb

You can see the colnames<- call causes ~4MB of extra memory to be used (the "max used (Mb)" increased by that amount). The entire xts object is ~8MB, half of which is the coredata and the other half is the index. So the 4MB of extra memory used is to copy the coredata.

If you want to avoid the copy, you can set it manually. But be careful, because you could do something that would otherwise be caught by the "checks" in colnames<-.xts.

> R -q  # new R session
R> x <- xts::.xts(1:1e6, 1:1e6)
R> tracemem(x)
[1] "<0x2cc5330>"
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  256397 13.7     592000 31.7   350000 18.7
Vcells 1440915 11.0    4397699 33.6  3441761 26.3
R> attr(x, 'dimnames') <- list(NULL, "hi")
tracemem[0x2cc5330 -> 0x28f4a00]: 
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  256403 13.7     592000 31.7   350000 18.7
Vcells 1440916 11.0    4397699 33.6  3441761 26.3
R> print(object.size(x), units="Mb")
7.6 Mb

Upvotes: 9

Related Questions