Reputation: 378
I am trying to count the number of rows between two timestamps in one data frame using timestamps that are supplied from another. Here are two mock data frames, similar to what I am working with:-
Data frame 1
#df1===========================================================
UserId1<-c("9f649f366edf", "9f649f366edf", "9f649f366edf", "9f649f366edf",
"9f649f366edf", "9f649f366edf", "9f649f366edf", "9f649f366edf",
"9f649f366edf", "6bc397bdc516", "6bc397bdc516", "6bc397bdc516",
"6bc397bdc516", "6bc397bdc516", "6bc397bdc516", "6bc397bdc516",
"6bc397bdc516", "f24cbff4c81e", "f24cbff4c81e",
"f24cbff4c81e", "f24cbff4c81e", "f24cbff4c81e")
Status1<-c("Abandoned", "Abandoned", "Answered", "Answered", "Abandoned",
"Abandoned", "Abandoned", "Abandoned", "Abandoned", "Abandoned",
"Abandoned", "Abandoned", "Abandoned", "Abandoned", "Abandoned",
"Abandoned", "Abandoned", "Abandoned", "Abandoned",
"Abandoned", "Answered", "Answered")
DateTime<-structure(c(1548029115, 1548035560, 1548099858, 1548099996, 1548824396,
1548824737, 1548824927, 1548825554, 1548825793, 1576821965, 1576821999,
1576822013, 1576822152, 1576822484, 1576865566, 1576926050, 1577037551,
1560000877, 1560001005, 1560013996, 1560014372, 1560186676
), class = c("POSIXct", "POSIXt"), tzone = "Europe/London")
df1<-data.frame(UserId1,Status1,DateTime)
df1$DateTime<-as.numeric(df1$DateTime)
df1$DateTime<-as.POSIXct(df1$DateTime, origin = "1970-01-01 00:00:00")
colnames(df1)<-c("UserId","Status","DateTime")
View(df1)
View(df1)
Data frame 2
UserId2<-c("9f649f366edf", "9f649f366edf", "9f649f366edf", "9f649f366edf",
"9f649f366edf", "6bc397bdc516", "6bc397bdc516", "6bc397bdc516",
"6bc397bdc516", "f24cbff4c81e", "f24cbff4c81e", "f24cbff4c81e"
)
OrigTime<-structure(c(1548029115, 1548035560, 1548099858, 1548099996, 1548824396,
1576821965, 1576865566, 1576926050, 1577037551, 1560000877, 1560013996,
1560186676), class = c("POSIXct", "POSIXt"), tzone = "Europe/London")
LastTime<-structure(c(1548029115, 1548035560, 1548099858, 1548099996, 1548825793,
1576822484, 1576865566, 1576926050, 1577037551, 1560001005, 1560014372,
1560186676), class = c("POSIXct", "POSIXt"), tzone = "Europe/London")
Status<-c("Abandoned", "Abandoned", "Answered", "Answered", "Abandoned",
"Abandoned", "Abandoned", "Abandoned", "Abandoned", "Abandoned",
"Answered", "Answered")
calls<-c(1, 1, 1, 1, 6, 6, 1, 1, 1, 3, 4, 1)
df2<-data.frame(UserId2,OrigTime,LastTime,Status,calls)
df2$OrigTime<-as.numeric(df2$OrigTime)
df2$OrigTime<-as.POSIXct(df2$OrigTime, origin = "1970-01-01 00:00:00")
df2$LastTime<-as.numeric(df2$LastTime)
df2$LastTime<-as.POSIXct(df2$LastTime, origin = "1970-01-01 00:00:00")
colnames(df2)<-c("UserId","OrigTime","LastTime","Status","calls")
View(df2)
and here are the respective outputs of both data frames:-
#df1
UserId Status DateTime
1 9f649f366edf Abandoned 2019-01-21 00:05:15
2 9f649f366edf Abandoned 2019-01-21 01:52:40
3 9f649f366edf Answered 2019-01-21 19:44:18
4 9f649f366edf Answered 2019-01-21 19:46:36
5 9f649f366edf Abandoned 2019-01-30 04:59:56
6 9f649f366edf Abandoned 2019-01-30 05:05:37
7 9f649f366edf Abandoned 2019-01-30 05:08:47
8 9f649f366edf Abandoned 2019-01-30 05:19:14
9 9f649f366edf Abandoned 2019-01-30 05:23:13
10 6bc397bdc516 Abandoned 2019-12-20 06:06:05
11 6bc397bdc516 Abandoned 2019-12-20 06:06:39
12 6bc397bdc516 Abandoned 2019-12-20 06:06:53
13 6bc397bdc516 Abandoned 2019-12-20 06:09:12
14 6bc397bdc516 Abandoned 2019-12-20 06:14:44
15 6bc397bdc516 Abandoned 2019-12-20 18:12:46
16 6bc397bdc516 Abandoned 2019-12-21 11:00:50
17 6bc397bdc516 Abandoned 2019-12-22 17:59:11
18 f24cbff4c81e Abandoned 2019-06-08 14:34:37
19 f24cbff4c81e Abandoned 2019-06-08 14:36:45
20 f24cbff4c81e Abandoned 2019-06-08 18:13:16
21 f24cbff4c81e Answered 2019-06-08 18:19:32
22 f24cbff4c81e Answered 2019-06-10 18:11:16
#df2
UserId OrigTime LastTime Status calls
1 9f649f366edf 2019-01-21 00:05:15 2019-01-21 00:05:15 Abandoned 1
2 9f649f366edf 2019-01-21 01:52:40 2019-01-21 01:52:40 Abandoned 1
3 9f649f366edf 2019-01-21 19:44:18 2019-01-21 19:44:18 Answered 1
4 9f649f366edf 2019-01-21 19:46:36 2019-01-21 19:46:36 Answered 1
5 9f649f366edf 2019-01-30 04:59:56 2019-01-30 05:23:13 Abandoned 6
6 6bc397bdc516 2019-12-20 06:06:05 2019-12-20 06:14:44 Abandoned 6
7 6bc397bdc516 2019-12-20 18:12:46 2019-12-20 18:12:46 Abandoned 1
8 6bc397bdc516 2019-12-21 11:00:50 2019-12-21 11:00:50 Abandoned 1
9 6bc397bdc516 2019-12-22 17:59:11 2019-12-22 17:59:11 Abandoned 1
10 f24cbff4c81e 2019-06-08 14:34:37 2019-06-08 14:36:45 Abandoned 3
11 f24cbff4c81e 2019-06-08 18:13:16 2019-06-08 18:19:32 Answered 4
12 f24cbff4c81e 2019-06-10 18:11:16 2019-06-10 18:11:16 Answered 1
I am trying to count the rows in df1$DateTime that appear between the OrigTime
and LastTime
column in df2, as per UserId
, because the calls
column in df2 is wrong for some entries (a "call" is a single row within df1).
Here is an example of the output that I want df2 to look like; here is before:-
Using UserId== "f24cbff4c81e"
as an example; this user only has 5 rows (calls) within df1, but if you tally up the calls in df2, this user has 8. See the following before and after:-
Before
df2%>%filter(UserId=="f24cbff4c81e")
UserId OrigTime LastTime Status calls
1 f24cbff4c81e 2019-06-08 14:34:37 2019-06-08 14:36:45 Abandoned 3
2 f24cbff4c81e 2019-06-08 18:13:16 2019-06-08 18:19:32 Answered 4
3 f24cbff4c81e 2019-06-10 18:11:16 2019-06-10 18:11:16 Answered 1
The calls column is wrong as there is not as many rows between OrigTime
and LastTime
within df1$DateTime. Here is the correct outcome of which I want:-
Correct outcome
df2%>%filter(UserId=="f24cbff4c81e")
UserId OrigTime LastTime Status calls
1 f24cbff4c81e 2019-06-08 14:34:37 2019-06-08 14:36:45 Abandoned 2
2 f24cbff4c81e 2019-06-08 18:13:16 2019-06-08 18:19:32 Answered 2
3 f24cbff4c81e 2019-06-10 18:11:16 2019-06-10 18:11:16 Answered 1
For UserId=="f24cbff4c81e"
, there are 2 calls between 2019-06-08 14:34:37 (OrigTime in df2) and 2019-06-08 14:36:45 (LastTime in df2) in df1,
2 calls between 2019-06-08 18:13:16 (OrigTime in df2) and 2019-06-08 18:19:32 (LastTime in df2) in df1, and 1 call that occurred at 2019-06-10 18:11:16 (both OrigTime and LastTime; I'd like to keep the timestamps preserved here if possible) in df1. I hope you can see the logic of what I'm trying to achieve for the other users.
To conclude, if the number of rows between timestamps in df1 (as specified by OrigTime and LastTime in df2) do not match the calls
value in the respective row in df2, I want it changed to the correct value. Any help is massively appreciated :)
Upvotes: 1
Views: 108
Reputation: 12819
We could do:
library(dplyr)
count_calls <- function(UserId, OrigTime, LastTime, df_calls = df1) {
df_calls %>%
dplyr::filter(UserId == !! UserId,
dplyr::between(DateTime, OrigTime, LastTime)) %>%
nrow()
}
df2 %>%
mutate(calls_recount = pmap_int(list(UserId, OrigTime, LastTime), count_calls),
mismatch = calls != calls_recount)
#> UserId OrigTime LastTime Status calls calls_recount mismatch
#> 1 9f649f366edf 2019-01-21 01:05:15 2019-01-21 01:05:15 Abandoned 1 1 FALSE
#> 2 9f649f366edf 2019-01-21 02:52:40 2019-01-21 02:52:40 Abandoned 1 1 FALSE
#> 3 9f649f366edf 2019-01-21 20:44:18 2019-01-21 20:44:18 Answered 1 1 FALSE
#> 4 9f649f366edf 2019-01-21 20:46:36 2019-01-21 20:46:36 Answered 1 1 FALSE
#> 5 9f649f366edf 2019-01-30 05:59:56 2019-01-30 06:23:13 Abandoned 6 5 TRUE
#> 6 6bc397bdc516 2019-12-20 07:06:05 2019-12-20 07:14:44 Abandoned 6 5 TRUE
#> 7 6bc397bdc516 2019-12-20 19:12:46 2019-12-20 19:12:46 Abandoned 1 1 FALSE
#> 8 6bc397bdc516 2019-12-21 12:00:50 2019-12-21 12:00:50 Abandoned 1 1 FALSE
#> 9 6bc397bdc516 2019-12-22 18:59:11 2019-12-22 18:59:11 Abandoned 1 1 FALSE
#> 10 f24cbff4c81e 2019-06-08 15:34:37 2019-06-08 15:36:45 Abandoned 3 2 TRUE
#> 11 f24cbff4c81e 2019-06-08 19:13:16 2019-06-08 19:19:32 Answered 4 2 TRUE
#> 12 f24cbff4c81e 2019-06-10 19:11:16 2019-06-10 19:11:16 Answered 1 1 FALSE
Upvotes: 2