Count instance of datetime overlap across all rows in R dataframe

Question

hoping someone can assist me here. i've tried searching but nothing seems to match what i'm trying to do.

I'm trying to calculate, for each row in my dataframe, the number of instances that the rows datetime is repeated across time ranges in any other row.

I have a data frame which contains 3 datetimes columns, which are POSIXt, format= dd/mm/yyyy HH:MM.

I'd like my calculation to be in a new column called "duplicates"

|Row  | :Start_time:     | :Start_time_beg: | :Start_time_end:|
|--   |------------------|------------------|-----------------|
|1    | 01/01/2017 03:00 | 01/01/2017 01:30 | 01/01/2017 04:30|
|2    | 01/01/2017 04:00 | 01/01/2017 02:30 | 01/01/2017 05:30|
|3    | 01/01/2017 04:10 | 01/01/2017 02:40 | 01/01/2017 05:40|
|4    | 01/01/2017 05:00 | 01/01/2017 03:30 | 01/01/2017 06:30|
|5    | 01/01/2017 08:00 | 01/01/2017 06:30 | 01/01/2017 09:30|

So in the above example data i'd like to count every instance that Start_time occurs in the range Start_time_beg : Start_time_end for rows 1:n

The results for this data would be:

|Row     |Duplicates|
|----    |----------|
|:1:     | :3:      | (3 as overlaps with rows 1,2,3)
|:2:     | :4:      | (4 overlaps with rows 1,2,3,4)
|:3:     | :4:      | (4 overlaps with rows 1,2,3,4)
|:4:     | :3:      | (3 overlaps with rows 2,3,4)
|:5:     | :1:      | (1 as only overlaps with itself, row 5)

my thought was to create a seq array for each Start_time_beg:Start_time_End. Then create a data frame, with count of Start_time from that. I could then join this back onto the original df.

so far I have

x <- d1$Start_Time
y <- d1$Start_Time_Beg
z <- d1$Start_Time_End


t <- seq(y[1],z[1],"mins")
t2<- seq(y[2],z[2],"mins")

tn <- c(t,t2)

p<-count(tn,'tn')

Which gives me the desired df(p) from the time range array. The problem is I have tried to create a loop to generate t:nrows (rows goes into thousands so can't be manually typed) but i'm having no look

for (i in 1:length(d1$Start_Time))
{seq(d$Start_Time_Beg[c(1+i)],d$Start_Time_End[c(1+i)],"mins")}

This just gives me an int length = nrows. Not the array of datetimes I was after.

I'm not even sure if this is the right way to go about this i've had a bash at trying to use dplyr but no luck.

Any help much appreciated. Apologies my tables don't seem to have aligned properly

Thanks in advance for any help

Count instance of datetime overlap across all rows in R dataframe

Answers (1)

Explanation

Data

Related Questions