Reputation: 7517
I have a list
of list
of data.frames
(see L
below).
I was wondering if it might be possible to convert L
to my desired output shown below which is a single data.frame
?
L <- list(A = list(Short = data.frame(d = 1:2, SD = 3:4)),
B = list(Short = data.frame(d = 2:3, SD = 1:2), Long1 = data.frame(d = 7:8, SD = 6:7)),
C = list(Short = data.frame(d = 5:6, SD = 3:4), Long1 = data.frame(d = 8:9, SD = 1:2),
Long2 = data.frame(d = 4:5, SD = 6:7)))
Desired output (a data.frame
):
d SD id
1 3 1
2 4 1
2 1 2
3 2 2
7 6 2
8 7 2
5 3 3
6 4 3
8 1 3
9 2 3
4 6 3
5 7 3
Upvotes: 4
Views: 235
Reputation: 42592
rbindlist()
is a convenience function which makes one data.table from a list of many. For this nested list it has to be applied twice recursively.
In addition, it has the idcol
parameter which creates a column in the result showing which list item those rows came from.
library(data.table)
rbindlist(lapply(L, rbindlist, idcol = "es.type"), idcol = "id")
id es.type d SD 1: A Short 1 3 2: A Short 2 4 3: B Short 2 1 4: B Short 3 2 5: B Long1 7 6 6: B Long1 8 7 7: C Short 5 3 8: C Short 6 4 9: C Long1 8 1 10: C Long1 9 2 11: C Long2 4 6 12: C Long2 5 7
Now, the OP has requested that id
is numeric and that Long1
and Long2
must become Long
. This can be achieved by subsequent operations on the result columns:
rbindlist(lapply(L, rbindlist, idcol = "es.type"), idcol = "id")[
, id := rleid(id)][
, es.type := sub("\\d+$", "", es.type)][]
id es.type d SD 1: 1 Short 1 3 2: 1 Short 2 4 3: 2 Short 2 1 4: 2 Short 3 2 5: 2 Long 7 6 6: 2 Long 8 7 7: 3 Short 5 3 8: 3 Short 6 4 9: 3 Long 8 1 10: 3 Long 9 2 11: 3 Long 4 6 12: 3 Long 5 7
In base R, we can achieve the same by
do.call("rbind", lapply(L, do.call, what = "rbind"))
which returns
d SD A.Short.1 1 3 A.Short.2 2 4 B.Short.1 2 1 B.Short.2 3 2 B.Long1.1 7 6 B.Long1.2 8 7 C.Short.1 5 3 C.Short.2 6 4 C.Long1.1 8 1 C.Long1.2 9 2 C.Long2.1 4 6 C.Long2.2 5 7
id
and es.type
can be retrieved from parsing the row names, e.g.,
DF <- do.call("rbind", lapply(L, do.call, what = "rbind"))
id <- stringr::str_extract(row.names(DF), "^[^.]*")
# create sequence number (that's what data.table::rleid() does)
DF$id <- c(1L, cumsum(head(id, -1L) != tail(id, -1L)) + 1L)
DF$es.type <- stringr::str_extract(row.names(DF), "(?<=\\.)[^.0-9]*")
row.names(DF) <- NULL
DF
d SD id es.type 1 1 3 1 Short 2 2 4 1 Short 3 2 1 2 Short 4 3 2 2 Short 5 7 6 2 Long 6 8 7 2 Long 7 5 3 3 Short 8 6 4 3 Short 9 8 1 3 Long 10 9 2 3 Long 11 4 6 3 Long 12 5 7 3 Long
Upvotes: 0
Reputation: 6244
Here is another possible approach using purrr's flatten_dfr
:
library(purrr)
transform(flatten_dfr(L), id = rep(seq_along(L), times = map(L, ~sum(lengths(.x)))))
#> d SD id
#> 1 1 3 1
#> 2 2 4 1
#> 3 2 1 2
#> 4 3 2 2
#> 5 7 6 2
#> 6 8 7 2
#> 7 5 3 3
#> 8 6 4 3
#> 9 8 1 3
#> 10 9 2 3
#> 11 4 6 3
#> 12 5 7 3
NB: here I used base R's transform
which could be replaced by dplyr's mutate
Upvotes: 0
Reputation: 887971
We can use lapply/Map
in base R
. We can loop through the list
with lapply
, rbind
the nested list
elements, then create a new column with Map
and rbind
the outer list
elements
out <- do.call(rbind, Map(cbind, lapply(L, function(x)
do.call(rbind, x)), id = seq_along(L)))
row.names(out) <- NULL
out
# d SD id
#1 1 3 1
#2 2 4 1
#3 2 1 2
#4 3 2 2
#5 7 6 2
#6 8 7 2
#7 5 3 3
#8 6 4 3
#9 8 1 3
#10 9 2 3
#11 4 6 3
#12 5 7 3
Based on the comments, if we need to add another column from the names
of the inner list
out1 <- do.call(rbind, Map(cbind, lapply(L, function(dat)
do.call(rbind, Map(cbind, dat, es.type = names(dat)))), id = seq_along(L)))
row.names(out1) <- NULL
out1
# d SD es.type id
#1 1 3 Short 1
#2 2 4 Short 1
#3 2 1 Short 2
#4 3 2 Short 2
#5 7 6 Long1 2
#6 8 7 Long1 2
#7 5 3 Short 3
#8 6 4 Short 3
#9 8 1 Long1 3
#10 9 2 Long1 3
#11 4 6 Long2 3
#12 5 7 Long2 3
If there are ..\\d+
and want to remove
out1 <- do.call(rbind, Map(cbind, lapply(L, function(dat)
do.call(rbind, Map(cbind, dat,
es.type = sub("\\.*\\d+$", "", names(dat))))), id = seq_along(L)))
row.names(out1) <- NULL
out1
# d SD es.type id
#1 1 3 Short 1
#2 2 4 Short 1
#3 2 1 Short 2
#4 3 2 Short 2
#5 7 6 Long 2
#6 8 7 Long 2
#7 5 3 Short 3
#8 6 4 Short 3
#9 8 1 Long 3
#10 9 2 Long 3
#11 4 6 Long 3
#12 5 7 Long 3
Upvotes: 1
Reputation: 389325
We could try rbinding every list in L
and adding a new column which would denote the list number and finally bring the entire list into one dataframe using do.call
and rbind
.
output <- do.call(rbind, lapply(seq_along(L), function(x)
transform(do.call(rbind, L[[x]]), id = x)))
rownames(output) <- NULL
output
# d SD id
#1 1 3 1
#2 2 4 1
#3 2 1 2
#4 3 2 2
#5 7 6 2
#6 8 7 2
#7 5 3 3
#8 6 4 3
#9 8 1 3
#10 9 2 3
#11 4 6 3
#12 5 7 3
It might be a bit shorter using dplyr
's bind_rows
with purrr::map
but this gives id
variable as name of the list(A
, B
, C
) instead of sequence which should not be difficult to change.
library(dplyr)
bind_rows(purrr::map(L, bind_rows), .id = "id") %>%
mutate(id = match(id, unique(id)))
Upvotes: 3