leodido
leodido

Reputation: 922

Revert list structure

GOAL

Given a list of lists my goal is to reverse its structure (R language).

So, I want to bring the elements of the nested lists to be elements of the tier one list.

Probably an example better specifies my purpose. Given:

z <- list(z1 = list(a = 1, b = 2, c = 3), z2 = list(b = 4, a = 1, c = 0))

I want an output equivalent to the subsequent R object:

o <- list(a = list(z1 = 1, z2 = 1), b = list(z1 = 2, z2 = 4), c = list(z1 = 3, z2 = 0))

SOLUTIONS

MY SOLUTION

I created my own solution, which I am attaching below, but let me know if there is some better.

revert_list_str_1 <- function(ls) {
  res <- lapply(names(ls[[1]]), function(n, env) {
    name <- paste(n, 'elements', sep = '_')
    assign(name, vector('list', 0))
    inner <- sapply(ls, function(x) {
      assign(name, c(get(name), x[which(names(x) == n)]))
    })
    names(inner) <- names(ls)

    inner
  })
  names(res) <- names(ls[[1]])

  res
}

Executing str(revert_list_str_1(z)) I obtain the subsequent output, corresponding to what I wanted.

List of 3
 $ a:List of 2
  ..$ z1: num 1
  ..$ z2: num 1
 $ b:List of 2
  ..$ z1: num 2
  ..$ z2: num 4
 $ c:List of 2
  ..$ z1: num 3
  ..$ z2: num 0

But as I said I'd like to investigate (and learn) the existence of a more elegant and dynamic solution.

In fact my solution works fully only if all the nested lists have the same names (also in different order). This because of names(ls[[1]]). I would also point out that it acts only on lists of 2 levels, like the one reported.

So, do you know other solutions that are more dynamic? Can rapply and/or Filter functions be useful for this task?

end edit 1.

ANALYSIS OF PROPOSED SOLUTIONS

I've done a little analysis of the proposes solutions, thans you all !. The analysis consists of verifying the following points for all functions:

  1. accepted classes (nested list elements)
    1. type preserved also if there are elements with different types (if they are atomic)
    2. object contained in the elements preserved (e.g. a matrix)
  2. columns considered (for columns I mean the names of the nested lists)
    1. not common columns ignored (the classification 'not' is understood positively in this case)
    2. not common columns preserved
    3. it works also when columns do not match (based only on the names of the first nested list)

In all this cases the classification 'yes' is understood positively execept for point 2.1.

This are all the functions I've considered (the comments relate to the analysis items mentioned above):

# yes 1.1
# yes 1.2
# yes 2.1, not 2.2, not 2.3
revert_list_str_1 <- function(ls) { # @leodido
    # see above
}

# not 1.1
# not 1.2
# not 2.1, not 2.2, not 2.3
revert_list_str_2 <- function(ls) { # @mnel
  # convert each component of list to a data.frame
  # so rbind.data.frame so named elements are matched
  x <- data.frame((do.call(rbind, lapply(ls, data.frame))))
  # convert each column into an appropriately named list
  o <- lapply(as.list(x), function(i, nam) as.list(`names<-`(i, nam)), nam = rownames(x))

  o
}

# yes 1.1
# yes 1.2
# yes 2.1, not 2.2, yes 2.3
revert_list_str_3 <- function(ls) { # @mnel
  # unique names
  nn <- Reduce(unique, lapply(ls, names))
  # convert from matrix to list `[` used to ensure correct ordering
  as.list(data.frame(do.call(rbind,lapply(ls, `[`, nn))))
}

# yes 1.1
# yes 1.2
# yes 2.1, not 2.2, yes 2.3
revert_list_str_4 <- function(ls) { # @Josh O'Brien
  # get sub-elements in same order
  x <- lapply(ls, `[`, names(ls[[1]]))
  # stack and reslice
  apply(do.call(rbind, x), 2, as.list) 
}

# not 1.1
# not 1.2
# not 2.1, not 2.2, not 2.3
revert_list_str_5 <- function(ls) { # @mnel
  apply(data.frame((do.call(rbind, lapply(ls, data.frame)))), 2, as.list)
}

# not 1.1
# not 1.2
# not 2.1, yes 2.2, yes 2.3
revert_list_str_6 <- function(ls) { # @baptiste + @Josh O'Brien
  b <- recast(z, L2 ~ L1)
  apply(b, 1, as.list)
}

# yes 1.1
# yes 1.2
# not 2.1, yes 2.2, yes 2.3
revert_list_str_7 <-  function(ll) { # @Josh O'Brien
  nms <- unique(unlist(lapply(ll, function(X) names(X))))
  ll <- lapply(ll, function(X) setNames(X[nms], nms))
  ll <- apply(do.call(rbind, ll), 2, as.list)
  lapply(ll, function(X) X[!sapply(X, is.null)])
}

CONSIDERATIONS

From this analysis emerges that:

BENCHMARKS

To complete the work I've done some little benchmarks (with microbenchmark R package) on this 4 functions (times = 1000 for each benchmark).

BENCHMARK 1

Input:

list(z1 = list(a = 1, b = 2, c = 3), z2 = list(a = 0, b = 3, d = 22, f = 9)).

Results:

Unit: microseconds
    expr       min         lq     median         uq       max
1 func_1   250.069   467.5645   503.6420   527.5615  2028.780
2 func_3   204.386   393.7340   414.5485   429.6010  3517.438
3 func_4    89.922   173.7030   189.0545   194.8590  1669.178
4 func_6 11295.463 20985.7525 21433.8680 21934.5105 72476.316
5 func_7   348.585   387.0265   656.7270   691.2060  2393.988

Winner: revert_list_str_4.

BENCHMARK 2

Input:

list(z1 = list(a = 1, b = 2, c = 'ciao'), z2 = list(a = 0, b = 3, c = 5)).

revert_list_str_6 excluded because it does not support different type of nested child elements.

Results:

Unit: microseconds
    expr     min       lq   median       uq      max
1 func_1 249.558 483.2120 502.0915 550.7215 2096.978
2 func_3 210.899 387.6835 400.7055 447.3785 1980.912
3 func_4  92.420 170.9970 182.0335 192.8645 1857.582
4 func_7 257.772 469.9280 477.8795 487.3705 2035.101

Winner: revert_list_str_4.

BENCHMARK 3

Input:

list(z1 = list(a = 1, b = m, c = 'ciao'), z2 = list(a = 0, b = 3, c = m)).

m is a matrix 3x3 of integers and revert_list_str_6 has been excluded again.

Results:

Unit: microseconds
expr     min       lq   median       uq      max
1 func_1 261.173 484.6345 503.4085 551.6600 2300.750
2 func_3 209.322 393.7235 406.6895 449.7870 2118.252
3 func_4  91.556 174.2685 184.5595 196.2155 1602.983
4 func_7 252.883 474.1735 482.0985 491.9485 2058.306

Winner: revert_list_str_4. Again!

end edit 2.

CONCLUSION

First of all: thanks to all, wonderful solutions.

In my opinion if you know in advance that you list will have nested list with the same names reverse_str_4 is the winner as best compromise between performances and support for different types.

The most complete solution is revert_list_str_7 although the full flexibility induces an average of about 2.5 times a worsening of performances compared to reverse_str_4 (useful if your nested list have different names).

Upvotes: 23

Views: 4980

Answers (6)

nbenn
nbenn

Reputation: 692

I'd like to add a further solution to this valuable collection (to which I have turned many times):

revert_list_str_9 <- function(x) do.call(Map, c(c, x))

If this were code golf, we'd have a clear winner! Of course, this requires the individual list entries to be in the same order. This can be extended, using various ideas from above, such as

revert_list_str_10 <- function(x) {
  nme <- names(x[[1]]) # from revert_list_str_4
  do.call(Map, c(c, lapply(x, `[`, nme)))
}

revert_list_str_11 <- function(x) {
  nme <- Reduce(unique, lapply(x, names)) # from revert_list_str_3
  do.call(Map, c(c, lapply(x, `[`, nme)))
}

Performance-wise it's also not too shabby. If stuff is properly sorted, we have a new base R solution to beat. If not, timings still are very competitive.

z <- list(z1 = list(a = 1, b = 2, c = 3), z2 = list(b = 4, a = 1, c = 0))

microbenchmark::microbenchmark(
  revert_list_str_1(z), revert_list_str_2(z),  revert_list_str_3(z),
  revert_list_str_4(z), revert_list_str_5(z),  revert_list_str_7(z),
  revert_list_str_9(z), revert_list_str_10(z), revert_list_str_11(z),
  times = 1e3
)

#> Unit: microseconds
#>                   expr     min       lq      mean   median       uq       max
#>   revert_list_str_1(z)  51.946  60.9845  67.72623  67.2540  69.8215  1293.660
#>   revert_list_str_2(z) 461.287 482.8720 513.21260 490.5495 498.8110  1961.542
#>   revert_list_str_3(z)  80.180  89.4905  99.37570  92.5800  95.3185  1424.012
#>   revert_list_str_4(z)  19.383  24.2765  29.50865  26.9845  29.5385  1262.080
#>   revert_list_str_5(z) 499.433 525.8305 583.67299 533.1135 543.4220 25025.568
#>   revert_list_str_7(z)  56.647  66.1485  74.53956  70.8535  74.2445  1309.346
#>   revert_list_str_9(z)   6.128   7.9100  11.50801  10.2960  11.5240  1591.422
#>  revert_list_str_10(z)   8.455  10.9500  16.06621  13.2945  14.8430  1745.134
#>  revert_list_str_11(z)  14.953  19.8655  26.79825  22.1805  24.2885  2084.615

Unfortunately, this is not my creation, but exists courtesy of @thelatemail.

Upvotes: 1

Michael Lawrence
Michael Lawrence

Reputation: 1021

How about this simple solution, which is completely general, and almost as fast as Josh O'Brien's original answer that assumed common internal names (#4).

zv <- unlist(unname(z), recursive=FALSE)
ans <- split(setNames(zv, rep(names(z), lengths(z))), names(zv))

And here is a general version that is robust to not having names:

invertList <- function(z) {
    zv <- unlist(unname(z), recursive=FALSE)
    zind <- if (is.null(names(zv))) sequence(lengths(z)) else names(zv)
    if (!is.null(names(z)))
        zv <- setNames(zv, rep(names(z), lengths(z)))
    ans <- split(zv, zind)
    if (is.null(names(zv))) 
        ans <- unname(ans)
    ans
}

Upvotes: 3

Jacob H
Jacob H

Reputation: 4513

The recently released purrr contains a function, transpose, whose's purpose is to 'revert' a list structure. There is a major caveat to the transpose function, it matches on position and not name, https://cran.r-project.org/web/packages/purrr/purrr.pdf. These means that it is not the correct tool for the benchmark 1 above. I therefore only consider benchmark 2 and 3 below.

Benchmark 2

B2 <- list(z1 = list(a = 1, b = 2, c = 'ciao'), z2 = list(a = 0, b = 3, c = 5))

revert_list_str_8 <-  function(ll) { # @z109620
  transpose(ll)
}

microbenchmark(revert_list_str_1(B2), revert_list_str_3(B2), revert_list_str_4(B2), revert_list_str_7(B2), revert_list_str_8(B2), times = 1e3)
Unit: microseconds
                  expr     min       lq       mean   median       uq      max neval
 revert_list_str_1(B2) 228.752 254.1695 297.066646 268.8325 293.5165 4501.231  1000
 revert_list_str_3(B2) 211.645 232.9070 277.149579 250.9925 278.6090 2512.361  1000
 revert_list_str_4(B2)  79.673  92.3810 112.889130 100.2020 111.4430 2522.625  1000
 revert_list_str_7(B2) 237.062 252.7030 293.978956 264.9230 289.1175 4838.982  1000
 revert_list_str_8(B2)   2.445   6.8440   9.503552   9.2880  12.2200  148.591  1000

Clearly function transpose is the winner! It also utilizes much less code.

Benchmark 3

B3 <- list(z1 = list(a = 1, b = m, c = 'ciao'), z2 = list(a = 0, b = 3, c = m))

microbenchmark(revert_list_str_1(B3), revert_list_str_3(B3), revert_list_str_4(B3), revert_list_str_7(B3), revert_list_str_8(B3), times = 1e3)

 Unit: microseconds
                  expr     min       lq       mean  median      uq      max neval
 revert_list_str_1(B3) 229.242 253.4360 280.081313 266.877 281.052 2818.341  1000
 revert_list_str_3(B3) 213.600 232.9070 271.793957 248.304 272.743 2739.646  1000
 revert_list_str_4(B3)  80.161  91.8925 109.713969  98.980 108.022 2403.362  1000
 revert_list_str_7(B3) 236.084 254.6580 287.274293 264.922 280.319 2718.628  1000
 revert_list_str_8(B3)   2.933   7.3320   9.140367   9.287  11.243   55.233  1000

Again, transpose outperforms all others.

The problem with these above benchmarks test is that they focus on very small lists. For this reason, the numerous loops nested within functions 1-7 do not pose too much of a problem. As the size of the list and therefore the iteration increase, the speed gains of transpose will likely increase.

The purrr package is awesome! It does a lot more than revert lists. In combination with the dplyr package, the purrr package makes it possible to do most of your coding using the poweriful and beautiful functional programming paradigm. Thank the lord for Hadley!

Upvotes: 4

mnel
mnel

Reputation: 115382


EDIT -- working from @Josh O'Briens suggestion and my own improvemes

The problem was that do.call rbind was not calling rbind.data.frame which does some matching of names. rbind.data.frame should work, because data.frames are lists and each sublist is a list, so we could just call it directly.

apply(do.call(rbind.data.frame, z), 1, as.list)

However, while this may be succicint, it is slow because do.call(rbind.data.frame, ...) is inherently slow.


Something like (in two steps)

 # convert each component of z to a data.frame
 # so rbind.data.frame so named elements are matched
 x <- data.frame((do.call(rbind, lapply(z, data.frame))))
 # convert each column into an appropriately named list
 o <- lapply(as.list(x), function(i,nam) as.list(`names<-`(i, nam)), nam = rownames(x))
 o
$a
$a$z1
[1] 1

$a$z2
[1] 1


$b
$b$z1
[1] 2

$b$z2
[1] 4


$c
$c$z1
[1] 3

$c$z2
[1] 0

And an alternative

# unique names
nn <- Reduce(unique,lapply(z, names))
# convert from matrix to list `[` used to ensure correct ordering
as.list(data.frame(do.call(rbind,lapply(z, `[`, nn))))

Upvotes: 12

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162321

Edit:

Here's a more flexible version that will work on lists whose elements don't necessarily contain the same set of sub-elements.

fun <-  function(ll) {
    nms <- unique(unlist(lapply(ll, function(X) names(X))))
    ll <- lapply(ll, function(X) setNames(X[nms], nms))
    ll <- apply(do.call(rbind, ll), 2, as.list)
    lapply(ll, function(X) X[!sapply(X, is.null)])
}

## An example of an 'unbalanced' list
z <- list(z1 = list(a = 1, b = 2), 
          z2 = list(b = 4, a = 1, c = 0))
## Try it out
fun(z)

Original answer

z <- list(z1 = list(a = 1, b = 2, c = 3), z2 = list(b = 4, a = 1, c = 0))

zz <- lapply(z, `[`, names(z[[1]]))   ## Get sub-elements in same order
apply(do.call(rbind, zz), 2, as.list) ## Stack and reslice

Upvotes: 13

baptiste
baptiste

Reputation: 77096

reshape can get you close,

library(reshape)
b = recast(z, L2~L1)
split(b[,-1], b$L2)

Upvotes: 6

Related Questions