Javier Cabeza
Javier Cabeza

Reputation: 13

handling lists in lists to Dataframe in R

I´m new and i have some problems handling list and transform to dataframe

I have a list "ddt"

str(ddt)
 List of 4
 $ id           : chr "18136"
 $ comments.data:List of 3
  ..$ :List of 3
  .. ..$ timestamp: chr "2020-05-25T16:17:32+0000"
  .. ..$ text     : chr "Mocaaa"
  .. ..$ id       : chr "18096"
  ..$ :List of 3
  .. ..$ timestamp: chr "2020-05-25T16:00:00+0000"
  .. ..$ text     : chr "Capucchino"
  .. ..$ id       : chr "17846"
  ..$ :List of 3
  .. ..$ timestamp: chr "2020-05-25T14:42:53+0000"
  .. ..$ text     : chr "Mocachino"
  .. ..$ id       : chr "18037"

 $ id           : chr "17920"
 $ comments.data:List of 1
  ..$ :List of 3
  .. ..$ timestamp: chr "2020-05-24T15:31:30+0000"
  .. ..$ text     : chr "Hello"
  .. ..$ id       : chr "18054"

And i need this result

     id                  timestamp         text     id2
1 18136   2020-05-25T16:17:32+0000       Mocaaa   18096
2 18136   2020-05-25T16:00:00+0000   Capucchino   17846
3 18136   2020-05-25T14:42:53+0000    Mocachino   18037
4 17920   2020-05-24T15:31:30+0000        Hello   18054

Upvotes: 1

Views: 70

Answers (2)

Onyambu
Onyambu

Reputation: 79338

The structure seems to look just like a java script object.

You could do:

library(jsonlite)
library(tidyr)

unnest(unnest(fromJSON(toJSON(df))))

# A tibble: 6 x 4
     id tm                  text    id1
  <int> <chr>               <chr> <int>
1 92345 2020-05-26 14:53:53 X      6730
2 92345 2020-05-26 14:53:56 Q     92812
3 92345 2020-05-26 14:53:56 D     25304
4  9847 2020-05-26 14:53:56 E     82734
5  9847 2020-05-26 14:54:01 I     75079
6  9847 2020-05-26 14:54:02 H     89373

Upvotes: 1

r2evans
r2evans

Reputation: 160892

I think this can be done well with data.table.

set.seed(42)
df <- replicate(2, list(id = sample(1e5, 1), comments = replicate(3, list(tm = as.character(Sys.time() + sample(10, 1)), text = sample(LETTERS, 1), id = sample(1e5, 1)), simplify = FALSE)), simplify = FALSE)
str(df)
# List of 2
#  $ :List of 2
#   ..$ id      : int 91481
#   ..$ comments:List of 3
#   .. ..$ :List of 3
#   .. .. ..$ tm  : chr "2020-05-26 14:44:08"
#   .. .. ..$ text: chr "H"
#   .. .. ..$ id  : int 83045
#   .. ..$ :List of 3
#   .. .. ..$ tm  : chr "2020-05-26 14:44:05"
#   .. .. ..$ text: chr "N"
#   .. .. ..$ id  : int 73659
#   .. ..$ :List of 3
#   .. .. ..$ tm  : chr "2020-05-26 14:44:00"
#   .. .. ..$ text: chr "R"
#   .. .. ..$ id  : int 70507
#  $ :List of 2
#   ..$ id      : int 45775
#   ..$ comments:List of 3
#   .. ..$ :List of 3
#   .. .. ..$ tm  : chr "2020-05-26 14:44:06"
#   .. .. ..$ text: chr "Y"
#   .. .. ..$ id  : int 25543
#   .. ..$ :List of 3
#   .. .. ..$ tm  : chr "2020-05-26 14:44:03"
#   .. .. ..$ text: chr "Y"
#   .. .. ..$ id  : int 97823
#   .. ..$ :List of 3
#   .. .. ..$ tm  : chr "2020-05-26 14:44:00"
#   .. .. ..$ text: chr "M"
#   .. .. ..$ id  : int 56034

One thing we'll have to contend with is that you have id on the top-level as well as internally within each list.

library(data.table)
library(magrittr) # for %>%, demonstrative only, can be done without
data.table::rbindlist(df) %>%
  .[, comments := lapply(comments, as.data.table) ] %>%
  # we have a duplicate name 'id', rename in the inner ones
  .[, comments := lapply(comments, setnames, "id", "innerid") ] %>%
  .[, unlist(comments, recursive = FALSE), by = seq_len(nrow(.)) ]
#    seq_len                  tm text innerid
# 1:       1 2020-05-26 14:49:21    H   83045
# 2:       2 2020-05-26 14:49:18    N   73659
# 3:       3 2020-05-26 14:49:13    R   70507
# 4:       4 2020-05-26 14:49:19    Y   25543
# 5:       5 2020-05-26 14:49:16    Y   97823
# 6:       6 2020-05-26 14:49:13    M   56034

I suspect that the by=seq_len(nrow(.)) is not going to scale well to larger data. Since Rdatatable/data.table#3672 is still open, an alternative is to replace the last line (including unlist and seq_len) with just %>% tidyr::unnest(comments). I suspect that the combination of data.table and tidyr is at times contentious, I suggest that this non-partisan approach capitalizes on the strengths of both.

Upvotes: 1

Related Questions