Reputation: 309
I need to merge two lists of data frames by two key variables, ID and DATE. Here is an example of the data that I have:
names1 <- c("df1", "df2")
mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02- 29"), SCORE =c(677,635))
mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
list1 <- list(mydf1,mydf2)
names(list1) <- names1
names2 <- c("df_auto1", "df_auto2")
mydf_auto1 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708))
mydf_auto2 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
list2 <- list(mydf_auto1,mydf_auto2)
names(list2) <- names2
I tried to use Map function but the output I got is messed up. Here is what I tried to do:
V <-Map(merge, list1, list2,MoreArgs=list(by=c('ID','DATE'), all=TRUE))
for (i in seq_along(V)) {
write.csv(V[[i]], paste0("merge_",i, ".csv"))
}
As the final output, I'd like to get one dataframe with ID = 115477 and fully populated variables such as DATE, SCORE and Fprice; another dataframe with ID = 22319 and fully populated as well. For example, for ID = 115477 I'd like to get:
ID DATE SCORE Fprice
115477 2012-01-31 677 NA
115477 2012-02-29 635 6543
Does anyone have any idea of what I am doing wrong? Thank you for your help.
Upvotes: 2
Views: 1462
Reputation: 2920
Conduct the merge()
inside of mapply()
.
The end result is a list containing two data frames, each one the result of jth element in list2
being outer joined onto the ith element in list1
.
Note: There was a typo in the second DATE
element within mydf1
that is corrected below. My answer depends on the contents of list1
and list2
possessing data frames that contain the same ID
value, in the same order. As the OP has it arranged, mydf_auto2
is set to be merged onto mydf1
; whereas mydf_auto2
should be merged onto mydf2
based on these two data frames sharing the same ID
value. I revise the ordering within list2
to produce the desired output.
# create first list of data frames
names1 <- c("df1", "df2")
# note the extra spacing in "2012-02-29" has been corrected
mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), SCORE =c(677,635))
mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
list1 <- list(mydf1,mydf2)
names(list1) <- names1
# create second list of data frames
names2 <- c("df_auto1", "df_auto2")
# here is where I relabel the data frames
# to sync with `mydf1` and `mydf2` based on
# the `ID` values contained in `mydf_auto1` and `mydf_auto2`
mydf_auto1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
mydf_auto2 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708))
list2 <- list(mydf_auto1,mydf_auto2)
names(list2) <- names2
# merge the list of data frames together
merged.list.of.dfs <-
mapply( FUN = function( i, j )
merge( x = i
, y = j
, by = c( "ID", "DATE" )
, all = TRUE )
, list1
, list2
, SIMPLIFY = FALSE )
# view results
merged.list.of.dfs
# $df1
# ID DATE SCORE Fprice
# 3 115477 2012-01-31 677 NA
# 4 115477 2012-02-29 635 6543
#
# $df2
# ID DATE SCORE Fprice
# 1 22319 2011-09-30 621 8708
# 2 22319 2011-10-31 630 8708
# end of script #
Upvotes: 1
Reputation: 50678
Here is a tidyverse
approach:
library(tidyverse);
list(bind_rows(list1), bind_rows(list2)) %>%
reduce(function(x, y) full_join(x, y, by = c("ID", "DATE"))) %>%
filter(ID %in% c(115477))
# ID DATE SCORE Fprice
#1 115477 2012-01-31 677 NA
#2 115477 2012-02-29 635 6543
Explanation: For each list
we bind rows into a single data.frame
; we collect the two collapsed data.frame
s in a list
and then perform an outer join by "ID"
and "DATE"
; we use dplyr::filter
to pull out the rows of interest (here ID==115477
).
Upvotes: 1
Reputation: 301
It would be easier for you to do a merge
, then separately extract the IDs you want
names1 <- c("df1", "df2")
mydf1 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), SCORE =c(677,635))
mydf2 <- data.frame(ID=c(22319, 22319), DATE=c("2011-09-30","2011-10-31"), SCORE = c(621,630))
# Note the change to use of rbind instead of list
list1 <- rbind(mydf1, mydf2)
names2 <- c("df_auto1", "df_auto2")
mydf_auto1 <- data.frame(ID=c(22319, 22319),DATE=c("2011-09-30","2011-10-31") , Fprice =c(8708,8708))
mydf_auto2 <- data.frame(ID=c(115477, 115477), DATE=c("2012-01-31","2012-02-29"), Fprice = c(NA,6543))
list2 <- rbind(mydf_auto1,mydf_auto2)
df <- merge(list1, list2, by = c("ID", "DATE"))
df[df$ID == 115477,]
df[df$ID == 22319, ]
Upvotes: 0