QPaps
QPaps

Reputation: 312

Merge more than 2 dataframes together depending on column value in R

I would like to merge a dataframe with multiple dataframes depending on a value in two columns. I know I can merge two dataframes based on an element in a column using the merge() function, but I dont know how to do it when there are more than 2 dataframes.

For example, take this as the primary dataframe:

yr.col <- c(rep("2018",3), rep("2017",4), rep("2016",5))
mnth.col <- sample.int(4,12, replace = TRUE)
lon <- c(paste(1:12,"x"))
lat <- c(paste(1:12,"y"))

df <- data.frame(yr.col,lon,lat)

These are the other dataframes, which have the temperature for the set of lon and lat in different years.

tmp_18 <- sample.int(8,12,replace = TRUE)
tmp_17 <- sample.int(8,12,replace = TRUE)
tmp_16 <- sample.int(8,12,replace = TRUE)

env_18 <- data.frame(tmp_18,lon,lat)
env_17 <- data.frame(tmp_17, lon, lat)
env_16 <- data.frame(tmp_16, lon, lat)

Aim: I want to merge df with either env_18 env_17 or env_16 depending on df$yr.col

Expected result: A dataframe with a new column called tmp where the number from the correct env datasets are found

Previously tried:

1)

if (df$yr.col=="2018"){
  df.new$tmp <- merge(df,env_18, by=c("lon", "lat"))
  df.new$tmp.yr <- "2018"
}else if (df$yr.col=="2017"){
  df.new$tmp <- merge(df, env_17, by=c("lon", "lat"))
  df.new$tmp.yr <- "2017"
} else {
  df.new$tmp <- merge(df, env_16, by=c("lon", "lat"))
  df.new$tmp.yr <- "2016"}

produces this warning:

Warning message:
In if (df$yr.col == "2018") { :
  the condition has length > 1 and only the first element will be used

It only takes the first dataframe env_18 and merges that with df

I have also tried 2)

df.new <- ifelse(df$yr.col=="2018", merge(df, env_18, by=c("lon", "lat")), 
                       ifelse(df$yr.col=="2017", merge(df, env_17, by=c("lon", "lat")),
                              ifelse(df$yr.col=="2016", merge(df, env_16, by=c("lon", "lat")), "NA")))
df.new <- data.frame(matrix(unlist(df.new), nrow=length(df.new)))

but this does not give the desired outcome.

Is there some magic way to do this that I have not condisered or have I made an error? Perhaps a for-loop or function?
Thank you so much for your help in advance! I really appreciate it :))

Upvotes: 1

Views: 74

Answers (1)

slava-kohut
slava-kohut

Reputation: 4233

You can use dplyr and purrr for that. I could have used inner_join, but decided to keep merge as in the original post.

map2_dfr(list(env_16, env_17, env_18),
                2016:2018,
                function(x,y){merge(df %>% filter(yr.col == y), x, by=c("lon", "lat"))})

Output

lon  lat yr.col tmp_16 tmp_17 tmp_18
1  10 x 10 y   2016      1     NA     NA
2  11 x 11 y   2016      8     NA     NA
3  12 x 12 y   2016      7     NA     NA
4   8 x  8 y   2016      7     NA     NA
5   9 x  9 y   2016      2     NA     NA
6   4 x  4 y   2017     NA      5     NA
7   5 x  5 y   2017     NA      4     NA
8   6 x  6 y   2017     NA      8     NA
9   7 x  7 y   2017     NA      7     NA
10  1 x  1 y   2018     NA     NA      6
11  2 x  2 y   2018     NA     NA      2
12  3 x  3 y   2018     NA     NA      1

You can also create one column from tmps and drop the rest:

df$tmp <- coalesce(df$tmp_16, df$tmp_17, df$tmp_18)

Upvotes: 1

Related Questions