Reputation: 608
I have a list of zoo-bjects consisting of irregular time-series, lodf, in the following format:
> head(lodf)
[[1]]
2014-08-08 2014-08-14 2014-09-12
1.15 1.32 2.39
[[2]]
2014-07-22 2014-07-24 2014-08-14 2014-08-20 2014-08-27 2014-09-12
0.50 0.75 1.29 1.36 1.28 1.28
[[3]]
2012-11-01 2012-11-02 2013-07-12 2013-08-13 2013-09-11 2014-07-01
1.00 1.27 0.91 1.00 0.99 0.98
...
I am ultimately trying to sum all these time-series into one combined time-series, i.e. sum down each column. To do this, I am trying to convert into a zoo/xts time-series for further manipulation , i.e. to apply na.locf and other zoo-library capabilities before summing across the individual data frames/dates using rowsum. i.e. I am trying to get my list of date frames above into a combined zoo object resembling this:
Value
12/09/2014 1.07
14/08/2014 1.32
08/08/2014 1.15
12/09/2014 0.48
27/08/2014 0.53
20/08/2014 0.61
14/08/2014 0.54
24/07/2014 0.75
22/07/2014 0.5
01/07/2014 0.98
01/07/2014 0
...
There is often over-lap between the individual data frames i.e. several values corresponding to the same date index, and What I would like to do in those cases is to sum the values. E.g. if I have
012-11-01
0.7
012-11-01
1.5
012-11-01
0.7
I would like to have
012-11-01
2.9
as the value for this date index in the resulting large data frame.
I have tried merge, reading as a zoo object, do.call(rbind) etc. in the current format, but I am stumped. For further context, this question is part of a larger project outlined here: R: time series with duplicate time index entries. Any help would be most appreciated!
Update: please find a data object below as requested:
> dput(head(lodf))
list(structure(c(1.15, 1.32, 2.39), index = structure(c(16290L,
16296L, 16325L), class = "Date"), class = "zoo"), structure(c(0.5,
0.75, 1.29, 1.36, 1.28, 1.28), index = structure(c(16273L, 16275L,
16296L, 16302L, 16309L, 16325L), class = "Date"), class = "zoo"),
structure(c(1, 1.27, 0.91, 1, 0.99, 0.98), index = structure(c(15645L,
15646L, 15898L, 15930L, 15959L, 16252L), class = "Date"), class = "zoo"),
structure(c(1.27, 1.29, 1.28, 1.17, 0.59, 0), index = structure(c(15645L,
15651L, 15665L, 15679L, 15686L, 15747L), class = "Date"), class = "zoo"),
structure(c(1.9, 1.35, 0.66, 1.16, 0.66, 1.16, 1.26, 1.23,
1.28, 1.23, 1.17, 0.66, 1.18, 0.66, 1.29, 1.35, 1.45, 1.53,
1.61, 1.82, 1.8, 1.89, 1.8, 1.81, 1.78, 1.68, 2.18, 1.68,
1.56, 1.93, 1.84, 1.69, 1.18, 1.73, 1.18, 1.72, 1.83, 1.9,
1.99, 1.93, 1.87, 1.96, 2.1, 2.22, 2.33, 2.38, 2.35, 2.23,
2.16, 2.18, 2.17, 2.2, 2.29, 2.27, 2.28, 2.42, 2.48, 2.99,
2.56, 2.65, 2.69, 3.21, 2.7, 2.8, 2.79, 2.8, 2.78, 2.26,
2.78, 2.26, 2.12, 2.07, 1.97, 1.84, 1.77, 1.18, 1.7, 1.78,
1.91, 1.98, 1.93, 1.83, 1.76, 1.18, 1.01, 0.97, 0.86, 0.69,
0.56), index = structure(c(15645L, 15652L, 15660L, 15740L,
15797L, 15841L, 15860L, 15867L, 15876L, 15887L, 15890L, 15897L,
15901L, 15905L, 15908L, 15909L, 15910L, 15911L, 15915L, 15926L,
15931L, 15932L, 15938L, 15953L, 15954L, 15975L, 15978L, 15979L,
15981L, 15982L, 15985L, 15986L, 15987L, 16001L, 16003L, 16006L,
16008L, 16010L, 16014L, 16016L, 16021L, 16022L, 16023L, 16027L,
16029L, 16031L, 16045L, 16052L, 16059L, 16072L, 16077L, 16078L,
16084L, 16091L, 16098L, 16100L, 16101L, 16106L, 16132L, 16133L,
16134L, 16139L, 16146L, 16150L, 16153L, 16157L, 16160L, 16163L,
16167L, 16169L, 16170L, 16171L, 16175L, 16177L, 16182L, 16184L,
16212L, 16216L, 16220L, 16224L, 16248L, 16254L, 16258L, 16261L,
16297L, 16301L, 16309L, 16310L, 16317L), class = "Date"), class = "zoo"),
structure(c(3.35, 3.44, 3.41, 3.14, 3.11, 2.55, 2.65, 2.87,
3.14, 3.24, 3.41, 4.04, 4.19, 4.34, 4.44, 1.2, 1.3, 1.29,
1.3, 1.27, 0.77, 0.69, 0.55, 0), index = structure(c(15645L,
15650L, 15694L, 15740L, 15741L, 15742L, 15743L, 15749L, 15750L,
15751L, 15755L, 15756L, 15758L, 15762L, 15784L, 15800L, 15805L,
15810L, 15824L, 15835L, 15838L, 15840L, 15847L, 15849L), class = "Date"), class = "zoo"))
>
Upvotes: 3
Views: 842
Reputation: 269501
The input displayed at the top of the question appears to be the first three components of the input specified at the bottom of the question. The variable name used at the bottom of the question, lodf
, seems to suggest that it contains a list of data frames but in fact it contains a list of zoo objects.
The question asks for a single data frame result but we are assuming that the output should be a single zoo series too, for consistency. Also we shall use the name L
for the input as lodf
would wrongly suggest a list of data frames. If z
is the result as a zoo series then
data.frame(index = index(z), data = coredata(z))
could be used if a data frame really were desired.
In the output section near the end of this answer we show the result of using as our input L <- lodf[1:3]
(i.e. first 3 components only) and separately show the output using L <- lodf
(i.e. all components) as our input.
1) Reduce. We merge the zoo series in the list, L
, returning a list and filling in missing values with 0. Then use Reduce
to sum the components:
Reduce(`+`, do.call(merge, c(L, retclass = "list", fill = 0)))
1a) A variation of this is to return a zoo object from merge
(which is the default if we do not specify retclass
), then fill in its NAs with 0, turn it back into a list and use Reduce
:
Reduce(`+`, as.list(na.fill(do.call(merge, L), 0)))
2) rowSums In this solution we merge the lists to give zoo object z
, optionally add column names and then add across rows producing the final zoo object.
z <- do.call(merge, L)
colnames(L) <- seq_along(L) # optionally add names
zoo(rowSums(z, na.rm = TRUE), time(z))
Note that a rowSums
solution of zoo objects previously appeared here
3) + If we knew that there were exactly 3 components to the list then an alternate way to write the above would be this. We optionally add names 1, 2, 3, merge the zoo objects and fill NAs with 0. Finally we add the series together. Modify in the obvious way if the number of components differs.
z0 <- na.fill(do.call(merge, L), 0)
colnames(z0) <- 1:3 # optionally add names 1, 2, 3
z0[, 1] + z0[, 2] + z0[, 3]
Output Using L <- lodf[1:3]
as displayed at the start of the question where lodf
is shown at the bottom of the question our output is:
2012-11-01 2012-11-02 2013-07-12 2013-08-13 2013-09-11 2014-07-01 2014-07-22
1.00 1.27 0.91 1.00 0.99 0.98 0.50
2014-07-24 2014-08-08 2014-08-14 2014-08-20 2014-08-27 2014-09-12
0.75 1.15 2.61 1.36 1.28 3.67
or using L <- locf
in the above we get the following (except for solution 3 which would have to be modified in an obvious way to use 6 rather than 3 components):
2012-11-01 2012-11-02 2012-11-06 2012-11-07 2012-11-08 2012-11-16 2012-11-21
7.52 1.27 3.44 1.29 1.35 0.66 1.28
2012-12-05 2012-12-12 2012-12-20 2013-02-04 2013-02-05 2013-02-06 2013-02-07
1.17 0.59 3.41 4.30 3.11 2.55 2.65
2013-02-11 2013-02-13 2013-02-14 2013-02-15 2013-02-19 2013-02-20 2013-02-22
0.00 2.87 3.14 3.24 3.41 4.04 4.19
2013-02-26 2013-03-20 2013-04-02 2013-04-05 2013-04-10 2013-04-15 2013-04-29
4.34 4.44 0.66 1.20 1.30 1.29 1.30
2013-05-10 2013-05-13 2013-05-15 2013-05-16 2013-05-22 2013-05-24 2013-06-04
1.27 0.77 0.69 1.16 0.55 0.00 1.26
2013-06-11 2013-06-20 2013-07-01 2013-07-04 2013-07-11 2013-07-12 2013-07-15
1.23 1.28 1.23 1.17 0.66 0.91 1.18
2013-07-19 2013-07-22 2013-07-23 2013-07-24 2013-07-25 2013-07-29 2013-08-09
0.66 1.29 1.35 1.45 1.53 1.61 1.82
2013-08-13 2013-08-14 2013-08-15 2013-08-21 2013-09-05 2013-09-06 2013-09-11
1.00 1.80 1.89 1.80 1.81 1.78 0.99
2013-09-27 2013-09-30 2013-10-01 2013-10-03 2013-10-04 2013-10-07 2013-10-08
1.68 2.18 1.68 1.56 1.93 1.84 1.69
2013-10-09 2013-10-23 2013-10-25 2013-10-28 2013-10-30 2013-11-01 2013-11-05
1.18 1.73 1.18 1.72 1.83 1.90 1.99
2013-11-07 2013-11-12 2013-11-13 2013-11-14 2013-11-18 2013-11-20 2013-11-22
1.93 1.87 1.96 2.10 2.22 2.33 2.38
2013-12-06 2013-12-13 2013-12-20 2014-01-02 2014-01-07 2014-01-08 2014-01-14
2.35 2.23 2.16 2.18 2.17 2.20 2.29
2014-01-21 2014-01-28 2014-01-30 2014-01-31 2014-02-05 2014-03-03 2014-03-04
2.27 2.28 2.42 2.48 2.99 2.56 2.65
2014-03-05 2014-03-10 2014-03-17 2014-03-21 2014-03-24 2014-03-28 2014-03-31
2.69 3.21 2.70 2.80 2.79 2.80 2.78
2014-04-03 2014-04-07 2014-04-09 2014-04-10 2014-04-11 2014-04-15 2014-04-17
2.26 2.78 2.26 2.12 2.07 1.97 1.84
2014-04-22 2014-04-24 2014-05-22 2014-05-26 2014-05-30 2014-06-03 2014-06-27
1.77 1.18 1.70 1.78 1.91 1.98 1.93
2014-07-01 2014-07-03 2014-07-07 2014-07-10 2014-07-22 2014-07-24 2014-08-08
0.98 1.83 1.76 1.18 0.50 0.75 1.15
2014-08-14 2014-08-15 2014-08-19 2014-08-20 2014-08-27 2014-08-28 2014-09-04
2.61 1.01 0.97 1.36 2.14 0.69 0.56
2014-09-12
Updates Added additional solutions and re-arranged and expanded presentation.
Upvotes: 5
Reputation: 24535
With base R:
lodf = list(structure(list(`014-08-08` = 1.15, `2014-08-14` = 1.32,
`2014-09-12` = 2.39), .Names = c("014-08-08", "2014-08-14",
"2014-09-12"), class = "data.frame", row.names = c(NA, -1L)),
structure(list(`2014-07-22` = 0.5, `2014-07-24` = 0.75, `2014-08-14` = 1.29,
`2014-08-20` = 1.36, `2014-08-27` = 1.28, `2014-09-12` = 1.28), .Names = c("2014-07-22",
"2014-07-24", "2014-08-14", "2014-08-20", "2014-08-27", "2014-09-12"
), class = "data.frame", row.names = c(NA, -1L)), structure(list(
`2012-11-01` = 1, `2012-11-02` = 1.27, `2013-07-12` = 0.91,
`2013-08-13` = 1, `2013-09-11` = 0.99, `2014-07-01` = 0.98), .Names = c("2012-11-01",
"2012-11-02", "2013-07-12", "2013-08-13", "2013-09-11", "2014-07-01"
), class = "data.frame", row.names = c(NA, -1L)))
lodf
[[1]]
014-08-08 2014-08-14 2014-09-12
1 1.15 1.32 2.39
[[2]]
2014-07-22 2014-07-24 2014-08-14 2014-08-20 2014-08-27 2014-09-12
1 0.5 0.75 1.29 1.36 1.28 1.28
[[3]]
2012-11-01 2012-11-02 2013-07-12 2013-08-13 2013-09-11 2014-07-01
1 1 1.27 0.91 1 0.99 0.98
ddf = data.frame(full=character(), stringsAsFactors=F)
ll = unlist(lapply(lodf, function(x) paste(names(x), x, sep='_')))
ddf[1:length(ll),1]=ll
ddf
full
1 014-08-08_1.15
2 2014-08-14_1.32
3 2014-09-12_2.39
4 2014-07-22_0.5
5 2014-07-24_0.75
6 2014-08-14_1.29
7 2014-08-20_1.36
8 2014-08-27_1.28
9 2014-09-12_1.28
10 2012-11-01_1
11 2012-11-02_1.27
12 2013-07-12_0.91
13 2013-08-13_1
14 2013-09-11_0.99
15 2014-07-01_0.98
ddf$date = unlist(lapply(strsplit(ddf$full, '_'),function(x)x[1]))
ddf$value = as.numeric(unlist(lapply(strsplit(ddf$full, '_'),function(x)x[2])))
ddf = ddf[,-1]
ddf
date value
1 014-08-08 1.15
2 2014-08-14 1.32
3 2014-09-12 2.39
4 2014-07-22 0.50
5 2014-07-24 0.75
6 2014-08-14 1.29
7 2014-08-20 1.36
8 2014-08-27 1.28
9 2014-09-12 1.28
10 2012-11-01 1.00
11 2012-11-02 1.27
12 2013-07-12 0.91
13 2013-08-13 1.00
14 2013-09-11 0.99
15 2014-07-01 0.98
Finally:
aggregate(value~date, ddf, sum)
date value
1 2012.11.01 1.00
2 2012.11.02 1.27
3 2013.07.12 0.91
4 2013.08.13 1.00
5 2013.09.11 0.99
6 2014.07.01 0.98
7 2014.07.22 0.50
8 2014.07.24 0.75
9 2014.08.08 1.15
10 2014.08.14 2.61
11 2014.08.20 1.36
12 2014.08.27 1.28
13 2014.09.12 3.67
Upvotes: 1
Reputation: 887068
Try (If the list elements are list of zoo
objects and if you need to get the sum
of the matching index).
library(xts)
library(zoo)
z1 <- setNames(do.call(`merge`, lodf), paste0("Value", seq_along(lodf)))
xts(data.frame(value=rowSums(z1, na.rm=TRUE)), order.by=index(z1))
# value
#2012-11-01 1.00
#2012-11-02 1.27
#2013-07-12 0.91
#2013-08-13 1.00
#2013-09-11 0.99
#2014-07-01 0.98
#2014-07-22 0.50
#2014-07-24 0.75
#2014-08-08 1.15
#2014-08-14 2.61
#2014-08-20 1.36
#2014-08-27 1.28
#2014-09-12 3.67
If you need to use na.locf
before summing
z2 <- na.locf(z1)
xts(data.frame(value=rowSums(z2, na.rm=TRUE)), order.by=index(z2))
lodf <- list(structure(c(1.15, 1.32, 2.39), index = structure(c(16290,
16296, 16325), class = "Date"), class = "zoo"), structure(c(0.5,
0.75, 1.29, 1.36, 1.28, 1.28), index = structure(c(16273, 16275,
16296, 16302, 16309, 16325), class = "Date"), class = "zoo"),
structure(c(1, 1.27, 0.91, 1, 0.99, 0.98), index = structure(c(15645,
15646, 15898, 15930, 15959, 16252), class = "Date"), class = "zoo"))
Upvotes: 1