Reputation: 339
I have three dataframes (df1, df2 and df3). I would like to merge them and also, fill each others gaps. For example: df1 contains USA data from 1990 to 1993 and df2 from 1994 to 1999. When I merge them using reduce function, I get duplicate data (USA.x, USA.y) instead of just USA with all consecutive dates and values.
I did as follows:
df1 <- data.frame( stringsAsFactors = FALSE, Date = c("01/01/1990", "01/01/1991", "01/01/1992", "01/01/1993", "01/01/1994", "01/01/1995"), USA = c(1, 4, 2, 1, NA, NA), FRANCE = c(4, 4, 2, 5, NA, NA), ITALY = c(1, 4, 5, 2, NA, NA)) df2 <-data.frame( stringsAsFactors = FALSE, Date = c("01/01/1994", "01/01/1995", "01/01/1996", "01/01/1997", "01/01/1998", "01/01/1999"), USA = c(3, 3, 1, 4, 3, 1), FRANCE = c(2, 5, 2, 5, 5, 1), MEXICO = c(4, 1, 4, 3, NA, NA)) df3 <- data.frame( stringsAsFactors = FALSE, Date = c("01/01/1998", "01/01/1999", "01/01/2000", "01/01/2001", "01/01/2002", "01/01/2003"), MEXICO = c(3, 3, 5, 4, 2, 3), BELGIUM = c(4, 2, 1, 4, 5, 1)) df_list <- list(df1, df2, df3)
Option 1
dfall1 <- Reduce(function(x, y) merge(x, y, all=TRUE), df_list, accumulate=FALSE)
View(dfall1)
Results dfall1
It integrates columns according to duplicated names but it duplicates the dates. There will be only one column for each contry but 2 rows of the same date: one with NA, the other with a value from another data frame.
Option 2
dfall2 <- Reduce(function(x, y) merge(x, y, all=TRUE, by = "Date"), df_list, accumulate=FALSE)
View(dfall2)
Results dfall2
Date rows are not repeated by column names will be renamed as .x and .y.
Question How can I avoid duplicated row and columns. I wish for all data to be integrated according to their column name and corresponding dates
I would like to end up with the following:
Any help would be truly appreciated. Thanks in advance.
Upvotes: 1
Views: 50
Reputation: 72758
Reshaping, e.g. using reshape2
package. First, reshape into long format, next rbind
and na.omit
, finally reshape wide.
> lapply(df_list, reshape2:::melt.data.frame, id.vars='Date') |>
+ do.call(what='rbind') |> na.omit() |> reshape2::dcast(Date ~ variable)
Date USA FRANCE ITALY MEXICO BELGIUM
1 01/01/1990 1 4 1 NA NA
2 01/01/1991 4 4 4 NA NA
3 01/01/1992 2 2 5 NA NA
4 01/01/1993 1 5 2 NA NA
5 01/01/1994 3 2 NA 4 NA
6 01/01/1995 3 5 NA 1 NA
7 01/01/1996 1 2 NA 4 NA
8 01/01/1997 4 5 NA 3 NA
9 01/01/1998 3 5 NA 3 4
10 01/01/1999 1 1 NA 3 2
11 01/01/2000 NA NA NA 5 1
12 01/01/2001 NA NA NA 4 4
13 01/01/2002 NA NA NA 2 5
14 01/01/2003 NA NA NA 3 1
Data:
> dput(df_list)
list(structure(list(Date = c("01/01/1990", "01/01/1991", "01/01/1992",
"01/01/1993", "01/01/1994", "01/01/1995"), USA = c(1, 4, 2, 1,
NA, NA), FRANCE = c(4, 4, 2, 5, NA, NA), ITALY = c(1, 4, 5, 2,
NA, NA)), class = "data.frame", row.names = c(NA, -6L)), structure(list(
Date = c("01/01/1994", "01/01/1995", "01/01/1996", "01/01/1997",
"01/01/1998", "01/01/1999"), USA = c(3, 3, 1, 4, 3, 1), FRANCE = c(2,
5, 2, 5, 5, 1), MEXICO = c(4, 1, 4, 3, NA, NA)), class = "data.frame", row.names = c(NA,
-6L)), structure(list(Date = c("01/01/1998", "01/01/1999", "01/01/2000",
"01/01/2001", "01/01/2002", "01/01/2003"), MEXICO = c(3, 3, 5,
4, 2, 3), BELGIUM = c(4, 2, 1, 4, 5, 1)), class = "data.frame", row.names = c(NA,
-6L)))
Upvotes: 1