NeReiS
NeReiS

Reputation: 97

Compare a Date in a data frame with two dates in other data frame

I have been reading similar posts I can't make any of the solutions work to my case (probably because I am new in R).

I have a long dataset with several parameters, one of which is date, and another data frame with date intervals with correspond to a particular value. I tried to make a reproducible example:

df = data.frame(date=c("2017/08/01 19:00:00","2017/08/01 19:10:00","2017/08/01 19:20:00","2017/08/01 19:30:00",
                   "2017/08/01 19:40:00","2017/08/01 19:50:00","2017/08/01 20:00:00","2017/08/01 20:10:00"),
            factor1=c(10,15,11,13,14,12,16,15))

df2 = data.frame(start=c("2017/08/01 19:00:00","2017/08/01 19:40:00"),
             end=c("2017/08/01 19:15:00","2017/08/01 20:05:00"), factor2=c("A","B"))

df$date <- as.POSIXct(df$date) 
df2$start <- as.POSIXct(df2$start)
df2$end  <- as.POSIXct(df2$end)

And the result I would want is something like this:

result = data.frame(date=c("2017/08/01 19:00:00","2017/08/01 19:10:00","2017/08/01 19:20:00","2017/08/01 19:30:00",
                   "2017/08/01 19:40:00","2017/08/01 19:50:00","2017/08/01 20:00:00","2017/08/01 20:10:00"),
            factor1=c(10,15,11,13,14,12,16,15),factor2=c("A","A","NA","NA","B","B","B","NA"))

I tried with ifelse:

ifelse(df$date >= df2$start & df$date <= df2$end,df2$factor2,"NA")

But can't make it work.

Any advice?

Upvotes: 0

Views: 1669

Answers (2)

Patrik_P
Patrik_P

Reputation: 3200

This works as well on your sample data:

result <- df
result$factor2 <- NA
for (i in seq_along(df$date)){
  p <- ifelse(length(grep("TRUE", (df$date[i] >= df2$start & df$date[i] <= df2$end)))!=0,
              grep("TRUE", (df$date[i] >= df2$start & df$date[i] <= df2$end)),
              NA)
  result$factor2[i] <- ifelse(!is.na(p),
                          as.character(df2$factor2[p]),
                          "NA")
  }
print(result)
#                 date factor1 factor2
#1 2017-08-01 19:00:00      10       A
#2 2017-08-01 19:10:00      15       A
#3 2017-08-01 19:20:00      11      NA
#4 2017-08-01 19:30:00      13      NA
#5 2017-08-01 19:40:00      14       B
#6 2017-08-01 19:50:00      12       B
#7 2017-08-01 20:00:00      16       B
#8 2017-08-01 20:10:00      15      NA

Upvotes: 1

lebelinoz
lebelinoz

Reputation: 5068

I tried using inner join in sqldf and it seemed to work:

library(sqldf)
df3 = sqldf("select df.*, df2.factor2 from df inner join df2 where df.date >= df2.start and df.date <= df2.end")
result = merge(df, df3, by = "date", all.x = TRUE)

(Note that I also tried with left outer join instead of inner join, but this gave me the same results as the inner join... It must be a fluke in sqldf)

Upvotes: 0

Related Questions