Reputation: 389

Unlist list of data frames to one data frame with change structure

I have a list containing a large set of data frames (each data frame represents a year). Each data frame consists of two variables: (1) ID-variable, and (2) a Bernoulli variable indicating if an event happened that year. I want to want to unlist the list, and instead, create one data frame where each year's outcome variable is its own column.

Below, you can find code to generate example data that share the same feature as my data.

set.seed(123)
df <- list()
year <- (1950:1960)

# Simulating data. 
for (i in 1:length(year)) {
  id <- sample(1:20, 10)
  outcome <- sample(0:1, 10, replace = T)
  
  df[[i]] <- cbind(id, outcome)
}

So in practical terms, I want to unlist df and create a data frame that looks something like this:

      id outcome1950 outcome1951 outcome1952 outcome1953 outcome1954 outcome1955 [...]
 [1,]  1               
 [2,]  2       
 [3,]  3       
 [4,]  4       
 [5,]  5       
 [6,]  6       
 [7,]  7       
 [8,]  8       
 [9,]  9       
[10,] 10       
[11,] 11       
[12,] 12       
[13,] 13       
[14,] 14       
[15,] 15       
[16,] 16       
[17,] 17       
[18,] 18       
[19,] 19       
[20,] 20

Of course, the outcome should have each value for each year.

NOTE: All ID's doesn't exist in all years, in those cases, I want to add a NA.

Upvotes: 0

Answers (3)

Roccer

Reputation: 919

library(tidyverse)

names(df) <- paste0('outcome', year)

df %>% purrr::map_df(as.data.frame, tidyr::pivot_wider(names_from dplyr::arrange(id)

# A tibble: 20 x 12 id outcome1950 <int> 1     1          NA 2     2           1 3     3           1 4     4           0 5     5           0 6     6           0 7     7          NA 8     8          NA 9     9          NA 10    10           0 11    11           0 12    12          NA 13    13          NA 14    14           0 15    15           1 16    16          NA 17    17          NA 18    18          NA 19    19           1 20    20          NA

.id = 'name') %>% = name, values_from = outcome) %>% outcome1951 outcome1952 outcome1953 outcome1954 outcome1955 outcome1956 outcome1957 outcome1958 outcome1959 outcome1960 <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> 0 NA 0 NA 0 NA NA 0 NA NA NA 0 1 NA 1 NA 1 NA 0 NA 0 NA NA 1 NA NA 0 0 NA NA 0 NA NA NA 0 NA 1 0 1 NA NA 0 NA NA NA 1 NA NA NA NA NA 0 1 NA 1 1 NA 0 1 NA 1 0 0 1 NA 1 0 NA NA 1 0 0 0 1 NA NA 1 1 0 0 0 0 NA 1 NA NA 1 NA 0 0 0 0 NA NA 0 1 NA NA 1 1 NA NA 1 NA NA 1 NA 1 0 NA NA NA NA 0 1 NA NA 1 0 0 NA 0 NA 1 NA NA 1 0 1 0 1 NA NA 0 NA 0 NA NA 0 0 NA 1 0 NA 0 1 NA 1 NA NA NA NA 0 0 0 NA NA 0 NA 0 NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA 1 NA NA NA 1 0 NA 0 0 1 0 1 NA NA NA 1 1 1 0 0 0 1 NA NA 0

Upvotes: 2

DPH

Reputation: 4344

if I understood your problem correctly a possible solution would be this:

library(dplyr)
library(tidyr)
library(purrr)
library(plyr)

purrr::map2(df, year, ~ cbind(.y, .x))  %>% 
  plyr::ldply(data.frame) %>% 
  dplyr::rename(YEAR = 1) %>% 
  tidyr::pivot_wider(names_from = "YEAR", names_prefix = "outcome", values_from = "outcome") %>% 
  dplyr::arrange(id)

Upvotes: 1

DaveArmstrong

Reputation: 21757

This should do it.

set.seed(123)
df <- list()
year <- (1950:1960)

# Simulating data. 
for (i in 1:length(year)) {
  id <- sample(1:20, 10)
  outcome <- sample(0:1, 10, replace = T)
  
  df[[i]] <- cbind(id, outcome)
}

## add names to df
names(df) <- year

## turn each element in df into a tibble (or data frame) and make a 
## variable called "year" that has is "outcomeYYYY"
for(i in 1:length(df)){
  df[[i]] <- as_tibble(df[[i]]) %>% mutate(year = paste0("outcome", names(df)[i]))
}

## turn the elements of the list into a single data tibble
df <- bind_rows(df)

## pivot to wider and arrange by id
dfwide <- df %>% 
  pivot_wider(names_from="year", values_from="outcome") %>% 
  arrange(id)

head(dfwide)
# A tibble: 6 x 12
#      id outcome1950 outcome1951 outcome1952 outcome1953 outcome1954
#   <int>       <int>       <int>       <int>       <int>       <int>
# 1     1          NA           0          NA           0          NA
# 2     2           1          NA           0           1          NA
# 3     3           1           0          NA          NA           1
# 4     4           0           0          NA          NA          NA
# 5     5           0          NA           0          NA          NA
# 6     6           0          NA           0           1          NA
# … with 6 more variables: outcome1955 <int>, outcome1956 <int>,
#   outcome1957 <int>, outcome1958 <int>, outcome1959 <int>,
#   outcome1960 <int>

Upvotes: 2

Unlist list of data frames to one data frame with change structure

Answers (3)

Related Questions