Convert list of lists to data.frame

Question

I have a dataframe as follows:

library("dplyr")

df <- data.frame(
    name=c('group1', 'group2'), 
    n_success=c(32, 30), 
    n=c(122, 123), 
    stringsAsFactors = FALSE
)

For each group, I take 1000 samples from a beta distribution:

df <- df %>% 
  mutate(sims = list(rbeta(1000, 1+n_success, 1+n-n_success))) %>%
  select(name, sims)

# str(df)
# prints out:
# name: chr "group1" "group2"
# sims: List of 1

I now have a dataframe where each row consists of a string and of a list.

How do I go from this to a dataframe where the column names are "group1" and "group2", and each of the columns are the 1000 observed simulations above? Note that the number of groups might be pretty arbitrary, so if I had 12 groups, I would like 12 columns.

David · Accepted Answer

You can also stick to dplyr and the tidyverse. I would do it like so

library(dplyr)
library(tidyr) # for unnest() and spread()

df <- data.frame(
  name=c('group1', 'group2'), 
  n_success=c(32, 30), 
  n=c(122, 123), 
  stringsAsFactors = FALSE
)

# continuing your approach (be aware that I added a list() and closed a missing parenthesis)
df2 <- df %>% 
  mutate(sims = list(rbeta(1000, 1+n_success, 1+n-n_success))) %>%
  select(name, sims)
str(df2)
#> 'data.frame':    2 obs. of  2 variables:
#>  $ name: chr  "group1" "group2"
#>  $ sims:List of 2
#>   ..$ : num  0.178 0.313 0.272 0.25 0.271 ...
#>   ..$ : num  0.178 0.313 0.272 0.25 0.271 ...


# using unnest and mutate to create a variable that labels the rows
df3 <- df2 %>% unnest %>% group_by(name) %>% mutate(num = 1:n())
df3
#> # A tibble: 2,000 x 3
#> # Groups:   name [2]
#>      name      sims   num
#>           
#>  1 group1 0.1779776     1
#>  2 group1 0.3134262     2
#>  3 group1 0.2724994     3
#>  4 group1 0.2496521     4
#>  5 group1 0.2714030     5
#>  6 group1 0.2192758     6
#>  7 group1 0.2056501     7
#>  8 group1 0.2210970     8
#>  9 group1 0.2505481     9
#> 10 group1 0.2945622    10
#> # ... with 1,990 more rows

# spread the data-frame again
df_final <- df3 %>% spread(key = name, value = sims)
df_final
#> # A tibble: 1,000 x 3
#>      num    group1    group2
#>  *           
#>  1     1 0.1779776 0.1779776
#>  2     2 0.3134262 0.3134262
#>  3     3 0.2724994 0.2724994
#>  4     4 0.2496521 0.2496521
#>  5     5 0.2714030 0.2714030
#>  6     6 0.2192758 0.2192758
#>  7     7 0.2056501 0.2056501
#>  8     8 0.2210970 0.2210970
#>  9     9 0.2505481 0.2505481
#> 10    10 0.2945622 0.2945622
#> # ... with 990 more rows

If you don't want/need the num-variable you can deselect it again with select(df_final, -num).

Does that help you?

Convert list of lists to data.frame

Answers (2)

Related Questions