A.M.
A.M.

Reputation: 323

Efficiently modify list in R

I have foreach loop that produces a list within each loop and a .combine function to combine them that looks like this:

mergelists = function(x,xn) {
  padlen = length(x[[1]])
  for (n in names(x)[!names(x) %in% names(xn)])  xn[[n]] = 0
  for (n in names(xn)[!names(xn) %in% names(x)]) xn[[n]] = c(rep(0,padlen), xn[[n]])
  for (idx in names(xn)) { x[[idx]] = c( x[[idx]], xn[[idx]] ) }
  x
}

The first two for-loops modify the new list (xn) to make it compatible to the the one that gathers the results (x). The last one joins x and xn onto x.

I believe my code is ridiculously inefficient, because it re-allocates a lot and uses for-loops. But I can't think about a better solution. Any ideas?

Some more explanation: I don't know the list names in advance (they are patterns from a bootstrap exercise which takes place in the foreach part).

Example:

> x
$foo
[1] 3 2

$bar
[1] 3 2

and

> xn
$foo
[1] 1

$baz
[1] 1

should join to

> x
$foo
[1] 3 2 1

$bar
[1] 3 2 0

$baz
[1] 0 0 1

That's it.

Upvotes: 4

Views: 852

Answers (2)

Ari B. Friedman
Ari B. Friedman

Reputation: 72739

If foo and bar exist in every list and are in order, then mapply works. As @BenBarnes suggested, having a pre-processing step to create the 0's makes this a viable option even if they do not exist everywhere. Sorting is easy. I've changed the 0's to NAs since that seems more appropriate.

# Make data
x <- list(foo=c(3,2),bar=c(6,7))
xn <- list(foo=c(1),bar=c(1),aught=c(5,2))
lol <- list(x=x,xn=xn)

# Pre-process
allnames <- sort(unique(unlist(lapply(lol, names))))
cleanlist <- function(l,allnames) {
  ret <- l[allnames]
  names(ret) <- allnames
  ret[sapply(ret,is.null)] <- NA
  ret
}
lol <- lapply(lol,cleanlist,allnames=allnames)

# Combine
do.call("mapply", c(c,lol) )

Which produces:

    aught bar foo
x      NA   6   3
xn1     5   7   2
xn2     2   1   1

Benchmarking

That said, if you're hoping for speed gains, the original version is still the fastest, presumably because it does the least. But the loopless approach is pretty elegant and scales to an arbitrary number of x's.

library(microbenchmark)
microbenchmark( mergelists(lol$x,lol$xn), mergeList2(lol$x,lol$xn), do.call("mapply", c(c,lol) ) )

Unit: microseconds
                          expr       min         lq     median         uq       max
1 do.call("mapply", c(c, lol))   155.048   159.5175   192.0635   195.5555   245.841
2    mergeList2(lol$x, lol$xn) 19938.288 20095.9905 20225.4750 20719.6730 27143.674
3    mergelists(lol$x, lol$xn)    63.416    68.1650    78.0825    84.3680    95.265

enter image description here

Upvotes: 3

BenBarnes
BenBarnes

Reputation: 19454

In my benchmarking, this approach takes longer than your approach, but since I already worked it out, I thought I'd post it anyway. Here's to doubling effort. If the names are completely unknown and you are forced to pad with zeros in the .combine function, you could try the following. (perhaps try it on a subset of your iterations first to see if it works):

library(reshape2)

mergeList2 <- function(x, xn) {
  xDF <- data.frame(ID = seq_along(x[[1]]), x)
  xnDF <- data.frame(ID = seq_along(xn[[1]]) + nrow(xDF), xn)
  meltedX <- melt(xDF, id = "ID")
  meltedXN <- melt(xnDF, id = "ID")
  res <- as.list(dcast(rbind(meltedX, meltedXN), ID ~ variable, 
    fill = 0))[-1]
  return(res)
}

Your example:

mergeList2(list(foo = c(3, 2), bar = c(3, 2)),
  list(foo = 1, baz= 1))

# $foo
# [1] 3 2 1

# $bar
# [1] 3 2 0

# $baz
# [1] 0 0 1

Test it out with a foreach example

set.seed(1)

foreach(dd = 1:10, .combine = mergeList2) %do% {
  theNames <- sample(c("foo", "bar", "baz"), 2)
  ans <- as.list(rpois(2, 4))
  names(ans) <- theNames
  ans
}

# $foo
#  [1] 4 7 2 4 0 2 0 4 5 3

# $baz
#  [1] 7 0 0 5 3 5 3 4 0 5

# $bar
#  [1] 0 5 2 0 5 0 0 0 6 0

Upvotes: 3

Related Questions