Reputation: 389
I have a list containing a large set of data frames (each data frame represents a year). Each data frame consists of two variables: (1) ID-variable, and (2) a Bernoulli variable indicating if an event happened that year. I want to want to unlist the list, and instead, create one data frame where each year's outcome variable is its own column.
Below, you can find code to generate example data that share the same feature as my data.
set.seed(123)
df <- list()
year <- (1950:1960)
# Simulating data.
for (i in 1:length(year)) {
id <- sample(1:20, 10)
outcome <- sample(0:1, 10, replace = T)
df[[i]] <- cbind(id, outcome)
}
So in practical terms, I want to unlist df
and create a data frame that looks something like this:
id outcome1950 outcome1951 outcome1952 outcome1953 outcome1954 outcome1955 [...]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
[5,] 5
[6,] 6
[7,] 7
[8,] 8
[9,] 9
[10,] 10
[11,] 11
[12,] 12
[13,] 13
[14,] 14
[15,] 15
[16,] 16
[17,] 17
[18,] 18
[19,] 19
[20,] 20
Of course, the outcome should have each value for each year.
NOTE: All ID's doesn't exist in all years, in those cases, I want to add a NA
.
Upvotes: 0
Views: 671
Reputation: 919
library(tidyverse)
names(df) <- paste0('outcome', year)
df %>%
purrr::map_df(as.data.frame, .id = 'name') %>%
tidyr::pivot_wider(names_from = name, values_from = outcome) %>%
dplyr::arrange(id)
# A tibble: 20 x 12
id outcome1950 outcome1951 outcome1952 outcome1953 outcome1954 outcome1955 outcome1956 outcome1957 outcome1958 outcome1959 outcome1960
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 NA 0 NA 0 NA 0 NA NA 0 NA NA
2 2 1 NA 0 1 NA 1 NA 1 NA 0 NA
3 3 1 0 NA NA 1 NA NA 0 0 NA NA
4 4 0 0 NA NA NA 0 NA 1 0 1 NA
5 5 0 NA 0 NA NA NA 1 NA NA NA NA
6 6 0 NA 0 1 NA 1 1 NA 0 1 NA
7 7 NA 1 0 0 1 NA 1 0 NA NA 1
8 8 NA 0 0 0 1 NA NA 1 1 0 0
9 9 NA 0 0 NA 1 NA NA 1 NA 0 0
10 10 0 0 0 NA NA 0 1 NA NA 1 1
11 11 0 NA NA 1 NA NA 1 NA 1 0 NA
12 12 NA NA NA NA 0 1 NA NA 1 0 0
13 13 NA NA 0 NA 1 NA NA 1 0 1 0
14 14 0 1 NA NA 0 NA 0 NA NA 0 0
15 15 1 NA 1 0 NA 0 1 NA 1 NA NA
16 16 NA NA NA 0 0 0 NA NA 0 NA 0
17 17 NA NA NA NA NA NA NA 1 NA NA NA
18 18 NA NA NA NA NA NA 1 NA NA NA 1
19 19 1 0 NA 0 0 1 0 1 NA NA NA
20 20 NA 1 1 1 0 0 0 1 NA NA 0
Upvotes: 2
Reputation: 4344
if I understood your problem correctly a possible solution would be this:
library(dplyr)
library(tidyr)
library(purrr)
library(plyr)
purrr::map2(df, year, ~ cbind(.y, .x)) %>%
plyr::ldply(data.frame) %>%
dplyr::rename(YEAR = 1) %>%
tidyr::pivot_wider(names_from = "YEAR", names_prefix = "outcome", values_from = "outcome") %>%
dplyr::arrange(id)
Upvotes: 1
Reputation: 21757
This should do it.
set.seed(123)
df <- list()
year <- (1950:1960)
# Simulating data.
for (i in 1:length(year)) {
id <- sample(1:20, 10)
outcome <- sample(0:1, 10, replace = T)
df[[i]] <- cbind(id, outcome)
}
## add names to df
names(df) <- year
## turn each element in df into a tibble (or data frame) and make a
## variable called "year" that has is "outcomeYYYY"
for(i in 1:length(df)){
df[[i]] <- as_tibble(df[[i]]) %>% mutate(year = paste0("outcome", names(df)[i]))
}
## turn the elements of the list into a single data tibble
df <- bind_rows(df)
## pivot to wider and arrange by id
dfwide <- df %>%
pivot_wider(names_from="year", values_from="outcome") %>%
arrange(id)
head(dfwide)
# A tibble: 6 x 12
# id outcome1950 outcome1951 outcome1952 outcome1953 outcome1954
# <int> <int> <int> <int> <int> <int>
# 1 1 NA 0 NA 0 NA
# 2 2 1 NA 0 1 NA
# 3 3 1 0 NA NA 1
# 4 4 0 0 NA NA NA
# 5 5 0 NA 0 NA NA
# 6 6 0 NA 0 1 NA
# … with 6 more variables: outcome1955 <int>, outcome1956 <int>,
# outcome1957 <int>, outcome1958 <int>, outcome1959 <int>,
# outcome1960 <int>
Upvotes: 2