rnorouzian
rnorouzian

Reputation: 7517

Converting a list of list of data.frames to a single data.frame in R

I have a list of list of data.frames (see L below).

I was wondering if it might be possible to convert L to my desired output shown below which is a single data.frame?

L <- list(A = list(Short = data.frame(d = 1:2, SD = 3:4)), 
          B = list(Short = data.frame(d = 2:3, SD = 1:2), Long1 = data.frame(d = 7:8, SD = 6:7)),
          C = list(Short = data.frame(d = 5:6, SD = 3:4), Long1 = data.frame(d = 8:9, SD = 1:2), 
               Long2 = data.frame(d = 4:5, SD = 6:7)))

Desired output (a data.frame):

d  SD id
1  3   1
2  4   1
2  1   2
3  2   2
7  6   2
8  7   2
5  3   3
6  4   3
8  1   3
9  2   3
4  6   3
5  7   3

Upvotes: 4

Views: 235

Answers (4)

Uwe
Uwe

Reputation: 42592

rbindlist() is a convenience function which makes one data.table from a list of many. For this nested list it has to be applied twice recursively.

In addition, it has the idcol parameter which creates a column in the result showing which list item those rows came from.

library(data.table)
rbindlist(lapply(L, rbindlist, idcol = "es.type"), idcol = "id")
    id es.type d SD
 1:  A   Short 1  3
 2:  A   Short 2  4
 3:  B   Short 2  1
 4:  B   Short 3  2
 5:  B   Long1 7  6
 6:  B   Long1 8  7
 7:  C   Short 5  3
 8:  C   Short 6  4
 9:  C   Long1 8  1
10:  C   Long1 9  2
11:  C   Long2 4  6
12:  C   Long2 5  7

Now, the OP has requested that id is numeric and that Long1 and Long2 must become Long. This can be achieved by subsequent operations on the result columns:

rbindlist(lapply(L, rbindlist, idcol = "es.type"), idcol = "id")[
  , id := rleid(id)][
    , es.type := sub("\\d+$", "", es.type)][]
    id es.type d SD
 1:  1   Short 1  3
 2:  1   Short 2  4
 3:  2   Short 2  1
 4:  2   Short 3  2
 5:  2    Long 7  6
 6:  2    Long 8  7
 7:  3   Short 5  3
 8:  3   Short 6  4
 9:  3    Long 8  1
10:  3    Long 9  2
11:  3    Long 4  6
12:  3    Long 5  7

In base R, we can achieve the same by

do.call("rbind", lapply(L, do.call, what = "rbind"))

which returns

          d SD
A.Short.1 1  3
A.Short.2 2  4
B.Short.1 2  1
B.Short.2 3  2
B.Long1.1 7  6
B.Long1.2 8  7
C.Short.1 5  3
C.Short.2 6  4
C.Long1.1 8  1
C.Long1.2 9  2
C.Long2.1 4  6
C.Long2.2 5  7

id and es.type can be retrieved from parsing the row names, e.g.,

DF <- do.call("rbind", lapply(L, do.call, what = "rbind"))
id <- stringr::str_extract(row.names(DF), "^[^.]*")
# create sequence number (that's what data.table::rleid() does)
DF$id <- c(1L, cumsum(head(id, -1L) != tail(id, -1L)) + 1L)
DF$es.type <- stringr::str_extract(row.names(DF), "(?<=\\.)[^.0-9]*")
row.names(DF) <- NULL
DF
   d SD id es.type
1  1  3  1   Short
2  2  4  1   Short
3  2  1  2   Short
4  3  2  2   Short
5  7  6  2    Long
6  8  7  2    Long
7  5  3  3   Short
8  6  4  3   Short
9  8  1  3    Long
10 9  2  3    Long
11 4  6  3    Long
12 5  7  3    Long

Upvotes: 0

Joris C.
Joris C.

Reputation: 6244

Here is another possible approach using purrr's flatten_dfr:

library(purrr)

transform(flatten_dfr(L), id = rep(seq_along(L), times = map(L, ~sum(lengths(.x)))))
#>    d SD id
#> 1  1  3  1
#> 2  2  4  1
#> 3  2  1  2
#> 4  3  2  2
#> 5  7  6  2
#> 6  8  7  2
#> 7  5  3  3
#> 8  6  4  3
#> 9  8  1  3
#> 10 9  2  3
#> 11 4  6  3
#> 12 5  7  3

NB: here I used base R's transform which could be replaced by dplyr's mutate

Upvotes: 0

akrun
akrun

Reputation: 887971

We can use lapply/Map in base R. We can loop through the list with lapply, rbind the nested list elements, then create a new column with Map and rbind the outer list elements

out <- do.call(rbind, Map(cbind, lapply(L, function(x) 
              do.call(rbind, x)), id = seq_along(L)))
row.names(out) <- NULL
out
#   d SD id
#1  1  3  1
#2  2  4  1
#3  2  1  2
#4  3  2  2
#5  7  6  2
#6  8  7  2
#7  5  3  3
#8  6  4  3
#9  8  1  3
#10 9  2  3
#11 4  6  3
#12 5  7  3

Based on the comments, if we need to add another column from the names of the inner list

out1 <- do.call(rbind, Map(cbind, lapply(L, function(dat)
   do.call(rbind, Map(cbind, dat, es.type = names(dat)))), id = seq_along(L)))
row.names(out1) <- NULL

out1
#   d SD es.type id
#1  1  3   Short  1
#2  2  4   Short  1
#3  2  1   Short  2
#4  3  2   Short  2
#5  7  6   Long1  2
#6  8  7   Long1  2
#7  5  3   Short  3
#8  6  4   Short  3
#9  8  1   Long1  3
#10 9  2   Long1  3
#11 4  6   Long2  3
#12 5  7   Long2  3

If there are ..\\d+ and want to remove

out1 <- do.call(rbind, Map(cbind, lapply(L, function(dat)
   do.call(rbind, Map(cbind, dat, 
     es.type = sub("\\.*\\d+$", "", names(dat))))), id = seq_along(L)))
row.names(out1) <- NULL
out1
#   d SD es.type id
#1  1  3   Short  1
#2  2  4   Short  1
#3  2  1   Short  2
#4  3  2   Short  2
#5  7  6    Long  2
#6  8  7    Long  2
#7  5  3   Short  3
#8  6  4   Short  3
#9  8  1    Long  3
#10 9  2    Long  3
#11 4  6    Long  3
#12 5  7    Long  3

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389325

We could try rbinding every list in L and adding a new column which would denote the list number and finally bring the entire list into one dataframe using do.call and rbind.

output <- do.call(rbind, lapply(seq_along(L), function(x) 
                          transform(do.call(rbind, L[[x]]), id = x)))
rownames(output) <- NULL

output
#   d SD id
#1  1  3  1
#2  2  4  1
#3  2  1  2
#4  3  2  2
#5  7  6  2
#6  8  7  2
#7  5  3  3
#8  6  4  3
#9  8  1  3
#10 9  2  3
#11 4  6  3
#12 5  7  3

It might be a bit shorter using dplyr's bind_rows with purrr::map but this gives id variable as name of the list(A, B, C) instead of sequence which should not be difficult to change.

library(dplyr)
bind_rows(purrr::map(L, bind_rows), .id = "id")  %>%
          mutate(id = match(id, unique(id)))

Upvotes: 3

Related Questions