Reputation: 312
I would like to merge a dataframe with multiple dataframes depending on a value in two columns. I know I can merge two dataframes based on an element in a column using the merge()
function, but I dont know how to do it when there are more than 2 dataframes.
For example, take this as the primary dataframe:
yr.col <- c(rep("2018",3), rep("2017",4), rep("2016",5))
mnth.col <- sample.int(4,12, replace = TRUE)
lon <- c(paste(1:12,"x"))
lat <- c(paste(1:12,"y"))
df <- data.frame(yr.col,lon,lat)
These are the other dataframes, which have the temperature for the set of lon and lat in different years.
tmp_18 <- sample.int(8,12,replace = TRUE)
tmp_17 <- sample.int(8,12,replace = TRUE)
tmp_16 <- sample.int(8,12,replace = TRUE)
env_18 <- data.frame(tmp_18,lon,lat)
env_17 <- data.frame(tmp_17, lon, lat)
env_16 <- data.frame(tmp_16, lon, lat)
Aim: I want to merge df
with either env_18
env_17
or env_16
depending on df$yr.col
Expected result: A dataframe with a new column called tmp
where the number from the correct env datasets are found
Previously tried:
1)
if (df$yr.col=="2018"){
df.new$tmp <- merge(df,env_18, by=c("lon", "lat"))
df.new$tmp.yr <- "2018"
}else if (df$yr.col=="2017"){
df.new$tmp <- merge(df, env_17, by=c("lon", "lat"))
df.new$tmp.yr <- "2017"
} else {
df.new$tmp <- merge(df, env_16, by=c("lon", "lat"))
df.new$tmp.yr <- "2016"}
produces this warning:
Warning message:
In if (df$yr.col == "2018") { :
the condition has length > 1 and only the first element will be used
It only takes the first dataframe env_18
and merges that with df
I have also tried 2)
df.new <- ifelse(df$yr.col=="2018", merge(df, env_18, by=c("lon", "lat")),
ifelse(df$yr.col=="2017", merge(df, env_17, by=c("lon", "lat")),
ifelse(df$yr.col=="2016", merge(df, env_16, by=c("lon", "lat")), "NA")))
df.new <- data.frame(matrix(unlist(df.new), nrow=length(df.new)))
but this does not give the desired outcome.
Is there some magic way to do this that I have not condisered or have I made an error? Perhaps a for-loop
or function
?
Thank you so much for your help in advance! I really appreciate it :))
Upvotes: 1
Views: 74
Reputation: 4233
You can use dplyr
and purrr
for that. I could have used inner_join
, but decided to keep merge
as in the original post.
map2_dfr(list(env_16, env_17, env_18),
2016:2018,
function(x,y){merge(df %>% filter(yr.col == y), x, by=c("lon", "lat"))})
Output
lon lat yr.col tmp_16 tmp_17 tmp_18
1 10 x 10 y 2016 1 NA NA
2 11 x 11 y 2016 8 NA NA
3 12 x 12 y 2016 7 NA NA
4 8 x 8 y 2016 7 NA NA
5 9 x 9 y 2016 2 NA NA
6 4 x 4 y 2017 NA 5 NA
7 5 x 5 y 2017 NA 4 NA
8 6 x 6 y 2017 NA 8 NA
9 7 x 7 y 2017 NA 7 NA
10 1 x 1 y 2018 NA NA 6
11 2 x 2 y 2018 NA NA 2
12 3 x 3 y 2018 NA NA 1
You can also create one column from tmp
s and drop the rest:
df$tmp <- coalesce(df$tmp_16, df$tmp_17, df$tmp_18)
Upvotes: 1