Reputation: 1
I want to merge several fields from 2 dataframes into a new dataframe. The merged data is based upon ID and Date and, the Date must equal or fall between start and end dates in the second dataframe.
The following answer to a similar question almost works for me however if the Date in the first dataframe equals the start date in the second dataframe, I get NA instead of the matching colour. Any help on ways to include the colour when the Date falls on the start date would be very much appreciated.
library(tidyverse)
library(lubridate)
df1 <- data.frame(ID=c(1, 2, 2, 3),
actual.date=mdy('3/31/2017', '2/11/2016','4/10/2016','5/15/2015'))
df2 <- data.frame(ID = c(1, 1, 1, 2, 3),
start = mdy('1/1/2000', '4/1/2011', '3/31/2017', '2/11/2016', '1/12/2012'),
end = mdy('3/31/2011', '6/4/2012', '04/04/2017', '3/31/2017', '2/12/2014'),
colour = c("blue", "purple", "blue", "red", "purple"))
df <- full_join(df1, df2, by = "ID") %>%
mutate(test = ifelse(actual.date <= end & actual.date > start,
TRUE,
FALSE)) %>%
filter(test) %>%
left_join(df1, ., by = c("ID", "actual.date")) %>%
select(ID, actual.date, colour)
Upvotes: 0
Views: 158
Reputation: 87
If you could show us a dataframe of the output you're looking for that would be useful, but I think this may achieve what you're trying to do. I don't think you want to be joining twice in the code above. When you do the filter() you drop the observations that are showing NAs and when you join again you've dropped those observations so they show up as NAs because they are only in one of the dataframes.
full_join(df1, df2, by = "ID") %>%
filter(actual.date <= end & actual.date >= start) %>%
select(ID, actual.date, colour)
Upvotes: 0