Reputation: 437
I am stuck at a question and would appreciate any ideas for the same
Consider I have a list of 3 data frames:
D1 <- data.frame(ID = sample(1:10), Y = sample(c("yes", "no"), 10, replace =TRUE))
D2 <- data.frame(ID = sample(5:14), Y = sample(c("yes", "no"), 10, replace = TRUE))
D3 <- data.frame(ID = sample(7:16), Y = sample(c("yes", "no"), 10, replace = TRUE))
L <- list(D1, D2, D3)
names <- c(D1, D2, D3)
names(L) <- names
DF<-as.list(names(L))
L <- mapply(cbind, L, "DF"=DF, SIMPLIFY=F)
I would like to alter the variable ID in such a way that if a particular ID, ID=16 for example, does not occur in D1 and D2 but only in D3, I want to append it as a value to the ID column in D1 and D2 and fill the corresponding Y column with 0.
All the three elements of the list should have the same ID columns eventually which is a union of ID values from the three data frames
I tried to do an rbind.fill or something similar but cannot figure it out and I am a little weak in handling lists in general. Any help will be much appreciated.
Thank you so much and hope you have a good weekend!
Upvotes: 1
Views: 63
Reputation: 214927
You can create a single column dataframe IDs
and then merge with each dataframe in the list with all = T
which add all IDs to each dataframe.
library(dplyr)
IDs <- data.frame(ID = unique(c(D1$ID, D2$ID, D3$ID)))
L <- lapply(L, function(df) merge(df, IDs, by = "ID", all = T) %>%
mutate(Y = ifelse(is.na(Y), 0, Y)))
Upvotes: 1
Reputation: 887028
Here is another option using full_join
from dplyr
library(dplyr)
Reduce(function(...) full_join(..., by="ID"), L)
# ID Y.x Y.y Y
#1 10 yes no yes
#2 9 no yes yes
#3 3 no <NA> <NA>
#4 6 yes no <NA>
#5 4 yes <NA> <NA>
#6 8 no no yes
#7 5 no yes <NA>
#8 1 yes <NA> <NA>
#9 2 yes <NA> <NA>
#10 7 no no yes
#11 14 <NA> no no
#12 12 <NA> yes yes
#13 11 <NA> no no
#14 13 <NA> yes no
#15 16 <NA> <NA> yes
#16 15 <NA> <NA> no
Upvotes: 0
Reputation: 160417
Another option.
This will require some post-processing clean-up, but:
set.seed(42)
D1 <- data.frame(ID = sample(1:10), Y = sample(c("yes", "no"), 10, replace =TRUE))
D2 <- data.frame(ID = sample(5:14), Y = sample(c("yes", "no"), 10, replace = TRUE))
D3 <- data.frame(ID = sample(7:16), Y = sample(c("yes", "no"), 10, replace = TRUE))
L <- list(D1, D2, D3)
Reduce(function(a,b) merge(a, b, by="ID", all.x=TRUE, all.y=TRUE), L)
# ID Y.x Y.y Y
# 1 1 yes <NA> <NA>
# 2 2 yes <NA> <NA>
# 3 3 no <NA> <NA>
# 4 4 yes <NA> <NA>
# 5 5 no yes <NA>
# 6 6 yes no <NA>
# 7 7 no no yes
# 8 8 no no yes
# 9 9 no yes yes
# 10 10 yes no yes
# 11 11 <NA> no no
# 12 12 <NA> yes yes
# 13 13 <NA> yes no
# 14 14 <NA> no no
# 15 15 <NA> <NA> no
# 16 16 <NA> <NA> yes
Upvotes: 0