Reputation: 4636
I know theres so many list to dataframe questions but I cant find the solution to this easy problem. Lets say I have:
library(tidyverse)
library(janitor)
df <- data.frame( group = c(rep("A",3), rep("B", 6)),
test_value = c(0,1,2, 0,1,2,3,4,5))
df_list <- df %>%
split(.$group) %>%
map(~tabyl(.x$test_value))
df_list
# $A
# .x$test_value n percent
# 0 1 0.3333333
# 1 1 0.3333333
# 2 1 0.3333333
# $B
# .x$test_value n percent
# 0 1 0.1666667
# 1 1 0.1666667
# 2 1 0.1666667
# 3 1 0.1666667
# 4 1 0.1666667
# 5 1 0.1666667
All I want to do is convert it to this named dataframe of the following:
A_test_value A_n A_percent B_test_value B_n B_percent
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 1 0.333 0 1 0.167
2 1 1 0.333 1 1 0.167
3 2 1 0.333 2 1 0.167
4 NA NA NA 3 1 0.167
5 NA NA NA 4 1 0.167
6 NA NA NA 5 1 0.167
I've seen this but it is slightly different (Converting nested list (unequal length) to data frame)
Would anyone have a quick solution (maybe dplyr
type) please?
Upvotes: 2
Views: 112
Reputation: 5788
Base R solution:
# Create the data:
df <- data.frame(group = c(rep("A",3), rep("B", 6)),
test_value = c(0,1,2, 0,1,2,3,4,5))
# Create the dataframe list, splitting on group:
df_list <- lapply(split(df, df$group), data.frame)
# Create the extra vars:
df_list <- mapply(cbind, df_list, "n" = 1, "percent" = 1/sapply(df_list, nrow), SIMPLIFY = FALSE)
# Row bind the dataframe list together into a single dataframe:
df2 <- data.frame(do.call(rbind, df_list), row.names = NULL, stringsAsFactors = FALSE)
# Spread by the test_value:
df2 <- reshape(df2, idvar = 'test_value', ids = unique(df2$test_value), direction = 'wide', timevar = 'group')
Upvotes: 1
Reputation: 34441
Perhaps you want to join?
library(dplyr)
library(purrr)
library(janitor)
df %>%
group_split(group) %>%
map(~tabyl(.x, test_value)) %>%
reduce(full_join, by = "test_value")
test_value n.x percent.x n.y percent.y
1 0 1 0.3333333 1 0.1666667
2 1 1 0.3333333 1 0.1666667
3 2 1 0.3333333 1 0.1666667
4 3 NA NA 1 0.1666667
5 4 NA NA 1 0.1666667
6 5 NA NA 1 0.1666667
For named output indicating group you could do:
df %>%
split(.$group) %>%
map(~tabyl(.x, test_value)) %>%
imap(~set_names(.x, ifelse(names(.x) != "test_value", paste(.y, names(.x), sep = "_"), names(.x)))) %>%
reduce(full_join, by = "test_value")
test_value A_n A_percent B_n B_percent
1 0 1 0.3333333 1 0.1666667
2 1 1 0.3333333 1 0.1666667
3 2 1 0.3333333 1 0.1666667
4 3 NA NA 1 0.1666667
5 4 NA NA 1 0.1666667
6 5 NA NA 1 0.1666667
Upvotes: 5
Reputation: 72909
You could first, add columns suffixes according to the sublists names
in the main list, add second a duplicate of the value column equally named e.g. "by"
to merge
(aka join) by later.
df_list <- Map(function(x) {
out <- `names<-`(
df_list[[x]], paste0(x, "_", c("test_value", "n", "percent")))
out <- cbind(out, by=out[,1])
}, names(df_list))
res <- merge(df_list$A, df_list$B, all=TRUE)[, -1]
res
# A_test_value A_n A_percent B_test_value B_n B_percent
# 1 0 1 0.3333333 0 1 0.1666667
# 2 1 1 0.3333333 1 1 0.1666667
# 3 2 1 0.3333333 2 1 0.1666667
# 4 NA NA NA 3 1 0.1666667
# 5 NA NA NA 4 1 0.1666667
# 6 NA NA NA 5 1 0.1666667
Upvotes: 2