metaltoaster
metaltoaster

Reputation: 378

Count rows between two timestamps from two different dataframes in R

I am trying to count the number of rows between two timestamps in one data frame using timestamps that are supplied from another. Here are two mock data frames, similar to what I am working with:-

Data frame 1

#df1===========================================================

UserId1<-c("9f649f366edf", "9f649f366edf", "9f649f366edf", "9f649f366edf", 
           "9f649f366edf", "9f649f366edf", "9f649f366edf", "9f649f366edf", 
           "9f649f366edf", "6bc397bdc516", "6bc397bdc516", "6bc397bdc516", 
           "6bc397bdc516", "6bc397bdc516", "6bc397bdc516", "6bc397bdc516", 
           "6bc397bdc516", "f24cbff4c81e", "f24cbff4c81e", 
           "f24cbff4c81e", "f24cbff4c81e", "f24cbff4c81e")


Status1<-c("Abandoned", "Abandoned", "Answered", "Answered", "Abandoned", 
           "Abandoned", "Abandoned", "Abandoned", "Abandoned", "Abandoned", 
           "Abandoned", "Abandoned", "Abandoned", "Abandoned", "Abandoned", 
           "Abandoned", "Abandoned", "Abandoned", "Abandoned", 
           "Abandoned", "Answered", "Answered")

DateTime<-structure(c(1548029115, 1548035560, 1548099858, 1548099996, 1548824396, 
                      1548824737, 1548824927, 1548825554, 1548825793, 1576821965, 1576821999, 
                      1576822013, 1576822152, 1576822484, 1576865566, 1576926050, 1577037551, 
                      1560000877, 1560001005, 1560013996, 1560014372, 1560186676
), class = c("POSIXct", "POSIXt"), tzone = "Europe/London")


df1<-data.frame(UserId1,Status1,DateTime)

df1$DateTime<-as.numeric(df1$DateTime)
df1$DateTime<-as.POSIXct(df1$DateTime, origin = "1970-01-01 00:00:00")

colnames(df1)<-c("UserId","Status","DateTime")

View(df1)


View(df1)

Data frame 2

UserId2<-c("9f649f366edf", "9f649f366edf", "9f649f366edf", "9f649f366edf", 
           "9f649f366edf", "6bc397bdc516", "6bc397bdc516", "6bc397bdc516", 
           "6bc397bdc516", "f24cbff4c81e", "f24cbff4c81e", "f24cbff4c81e"
)

OrigTime<-structure(c(1548029115, 1548035560, 1548099858, 1548099996, 1548824396, 
                      1576821965, 1576865566, 1576926050, 1577037551, 1560000877, 1560013996, 
                      1560186676), class = c("POSIXct", "POSIXt"), tzone = "Europe/London")

LastTime<-structure(c(1548029115, 1548035560, 1548099858, 1548099996, 1548825793, 
                      1576822484, 1576865566, 1576926050, 1577037551, 1560001005, 1560014372, 
                      1560186676), class = c("POSIXct", "POSIXt"), tzone = "Europe/London")




Status<-c("Abandoned", "Abandoned", "Answered", "Answered", "Abandoned", 
          "Abandoned", "Abandoned", "Abandoned", "Abandoned", "Abandoned", 
          "Answered", "Answered")



calls<-c(1, 1, 1, 1, 6, 6, 1, 1, 1, 3, 4, 1)

df2<-data.frame(UserId2,OrigTime,LastTime,Status,calls)


df2$OrigTime<-as.numeric(df2$OrigTime)
df2$OrigTime<-as.POSIXct(df2$OrigTime, origin = "1970-01-01 00:00:00")
df2$LastTime<-as.numeric(df2$LastTime)
df2$LastTime<-as.POSIXct(df2$LastTime, origin = "1970-01-01 00:00:00")

colnames(df2)<-c("UserId","OrigTime","LastTime","Status","calls")

View(df2)

and here are the respective outputs of both data frames:-

#df1

        UserId    Status            DateTime
1  9f649f366edf Abandoned 2019-01-21 00:05:15
2  9f649f366edf Abandoned 2019-01-21 01:52:40
3  9f649f366edf  Answered 2019-01-21 19:44:18
4  9f649f366edf  Answered 2019-01-21 19:46:36
5  9f649f366edf Abandoned 2019-01-30 04:59:56
6  9f649f366edf Abandoned 2019-01-30 05:05:37
7  9f649f366edf Abandoned 2019-01-30 05:08:47
8  9f649f366edf Abandoned 2019-01-30 05:19:14
9  9f649f366edf Abandoned 2019-01-30 05:23:13
10 6bc397bdc516 Abandoned 2019-12-20 06:06:05
11 6bc397bdc516 Abandoned 2019-12-20 06:06:39
12 6bc397bdc516 Abandoned 2019-12-20 06:06:53
13 6bc397bdc516 Abandoned 2019-12-20 06:09:12
14 6bc397bdc516 Abandoned 2019-12-20 06:14:44
15 6bc397bdc516 Abandoned 2019-12-20 18:12:46
16 6bc397bdc516 Abandoned 2019-12-21 11:00:50
17 6bc397bdc516 Abandoned 2019-12-22 17:59:11
18 f24cbff4c81e Abandoned 2019-06-08 14:34:37
19 f24cbff4c81e Abandoned 2019-06-08 14:36:45
20 f24cbff4c81e Abandoned 2019-06-08 18:13:16
21 f24cbff4c81e  Answered 2019-06-08 18:19:32
22 f24cbff4c81e  Answered 2019-06-10 18:11:16
#df2

         UserId            OrigTime            LastTime    Status calls
1  9f649f366edf 2019-01-21 00:05:15 2019-01-21 00:05:15 Abandoned     1
2  9f649f366edf 2019-01-21 01:52:40 2019-01-21 01:52:40 Abandoned     1
3  9f649f366edf 2019-01-21 19:44:18 2019-01-21 19:44:18  Answered     1
4  9f649f366edf 2019-01-21 19:46:36 2019-01-21 19:46:36  Answered     1
5  9f649f366edf 2019-01-30 04:59:56 2019-01-30 05:23:13 Abandoned     6
6  6bc397bdc516 2019-12-20 06:06:05 2019-12-20 06:14:44 Abandoned     6
7  6bc397bdc516 2019-12-20 18:12:46 2019-12-20 18:12:46 Abandoned     1
8  6bc397bdc516 2019-12-21 11:00:50 2019-12-21 11:00:50 Abandoned     1
9  6bc397bdc516 2019-12-22 17:59:11 2019-12-22 17:59:11 Abandoned     1
10 f24cbff4c81e 2019-06-08 14:34:37 2019-06-08 14:36:45 Abandoned     3
11 f24cbff4c81e 2019-06-08 18:13:16 2019-06-08 18:19:32  Answered     4
12 f24cbff4c81e 2019-06-10 18:11:16 2019-06-10 18:11:16  Answered     1

I am trying to count the rows in df1$DateTime that appear between the OrigTime and LastTime column in df2, as per UserId, because the calls column in df2 is wrong for some entries (a "call" is a single row within df1).

Here is an example of the output that I want df2 to look like; here is before:-

Using UserId== "f24cbff4c81e" as an example; this user only has 5 rows (calls) within df1, but if you tally up the calls in df2, this user has 8. See the following before and after:-

Before

df2%>%filter(UserId=="f24cbff4c81e")
         UserId            OrigTime            LastTime    Status calls
1 f24cbff4c81e 2019-06-08 14:34:37 2019-06-08 14:36:45 Abandoned     3
2 f24cbff4c81e 2019-06-08 18:13:16 2019-06-08 18:19:32  Answered     4
3 f24cbff4c81e 2019-06-10 18:11:16 2019-06-10 18:11:16  Answered     1  


The calls column is wrong as there is not as many rows between OrigTime and LastTime within df1$DateTime. Here is the correct outcome of which I want:-

Correct outcome

df2%>%filter(UserId=="f24cbff4c81e")
         UserId            OrigTime            LastTime    Status calls
1 f24cbff4c81e 2019-06-08 14:34:37 2019-06-08 14:36:45 Abandoned     2
2 f24cbff4c81e 2019-06-08 18:13:16 2019-06-08 18:19:32  Answered     2
3 f24cbff4c81e 2019-06-10 18:11:16 2019-06-10 18:11:16  Answered     1

For UserId=="f24cbff4c81e", there are 2 calls between 2019-06-08 14:34:37 (OrigTime in df2) and 2019-06-08 14:36:45 (LastTime in df2) in df1, 2 calls between 2019-06-08 18:13:16 (OrigTime in df2) and 2019-06-08 18:19:32 (LastTime in df2) in df1, and 1 call that occurred at 2019-06-10 18:11:16 (both OrigTime and LastTime; I'd like to keep the timestamps preserved here if possible) in df1. I hope you can see the logic of what I'm trying to achieve for the other users.

To conclude, if the number of rows between timestamps in df1 (as specified by OrigTime and LastTime in df2) do not match the calls value in the respective row in df2, I want it changed to the correct value. Any help is massively appreciated :)

Upvotes: 1

Views: 108

Answers (1)

Aur&#232;le
Aur&#232;le

Reputation: 12819

We could do:

library(dplyr)

count_calls <- function(UserId, OrigTime, LastTime, df_calls = df1) {
  df_calls %>% 
    dplyr::filter(UserId == !! UserId, 
                  dplyr::between(DateTime, OrigTime, LastTime)) %>% 
    nrow()
}

df2 %>% 
  mutate(calls_recount = pmap_int(list(UserId, OrigTime, LastTime), count_calls),
         mismatch = calls != calls_recount)

#>          UserId            OrigTime            LastTime    Status calls calls_recount mismatch
#> 1  9f649f366edf 2019-01-21 01:05:15 2019-01-21 01:05:15 Abandoned     1             1    FALSE
#> 2  9f649f366edf 2019-01-21 02:52:40 2019-01-21 02:52:40 Abandoned     1             1    FALSE
#> 3  9f649f366edf 2019-01-21 20:44:18 2019-01-21 20:44:18  Answered     1             1    FALSE
#> 4  9f649f366edf 2019-01-21 20:46:36 2019-01-21 20:46:36  Answered     1             1    FALSE
#> 5  9f649f366edf 2019-01-30 05:59:56 2019-01-30 06:23:13 Abandoned     6             5     TRUE
#> 6  6bc397bdc516 2019-12-20 07:06:05 2019-12-20 07:14:44 Abandoned     6             5     TRUE
#> 7  6bc397bdc516 2019-12-20 19:12:46 2019-12-20 19:12:46 Abandoned     1             1    FALSE
#> 8  6bc397bdc516 2019-12-21 12:00:50 2019-12-21 12:00:50 Abandoned     1             1    FALSE
#> 9  6bc397bdc516 2019-12-22 18:59:11 2019-12-22 18:59:11 Abandoned     1             1    FALSE
#> 10 f24cbff4c81e 2019-06-08 15:34:37 2019-06-08 15:36:45 Abandoned     3             2     TRUE
#> 11 f24cbff4c81e 2019-06-08 19:13:16 2019-06-08 19:19:32  Answered     4             2     TRUE
#> 12 f24cbff4c81e 2019-06-10 19:11:16 2019-06-10 19:11:16  Answered     1             1    FALSE

Upvotes: 2

Related Questions