Reputation: 15
I am working on a speaking time dataset (time series) using R. This data shows the In and Out time of different speakers. I want to (1) combine consecutive rows that belong to the same speaker by time order and (2) keep the time in the In
column of the first row and the time in the last Out
column in the new merged row. Could someone guide me how to solve this problem? Many thanks!
The original data:
structure(list(In = c("15:22.5", "15:31.2", "15:38.1", "15:45.4",
"15:52.2", "16:11.0", "16:23.4", "16:35.3", "16:36.9", "16:47.4",
"17:06.0", "17:11.2", "17:18.7", "17:26.7", "17:34.6", "18:05.0",
"18:17.5", "18:28.9", "18:32.4", "19:00.4", "19:41.3", "20:01.6"
), Out = c("15:27.7", "15:36.9", "15:45.2", "15:52.0", "16:10.8",
"16:22.0", "16:35.0", "16:36.8", "16:37.8", "17:04.8", "17:08.3",
"17:17.0", "17:23.8", "17:27.2", "18:04.3", "18:06.0", "18:24.3",
"18:31.8", "18:59.1", "19:40.2", "19:53.1", "20:19.1"), Speaker = c("Y",
"Y T", "Y", "T", "ATA", "Y", "T", "T", "Y", "Y T", "Y", "T",
"Y", "Y", "Y", "Y", "A T", "T", "T", "T", "T", "A TY")), class = "data.frame", row.names = c(NA,
-22L))
Expected Output:
structure(list(In = c("15:22.5", "15:31.2", "15:38.1", "15:45.4",
"15:52.2", "16:11.0", "16:23.4", "16:36.9", "16:47.4", "17:06.0",
"17:11.2", "17:18.7", "18:17.5", "18:28.9", "20:01.6"), Out = c("15:27.7",
"15:36.9", "15:45.2", "15:52.0", "16:10.8", "16:22.0", "16:36.8",
"16:37.8", "17:04.8", "17:08.3", "17:17.0", "18:06.0", "18:24.3",
"19:53.1", "20:19.1"), Speaker = c("Y", "Y T", "Y", "T", "ATA",
"Y", "T", "Y", "Y T", "Y", "T", "Y", "A T", "T", "A TY")), class = "data.frame", row.names = c(NA,
-15L))
Upvotes: 1
Views: 365
Reputation: 886938
We can create a grouping variable with rleid
from data.table
i.e. based on the similar adjacent values of 'Speaker', then summarise
by taking the first
value of 'In' and last
of 'Out' columns
library(dplyr)
library(data.table)
df1 %>%
group_by(grp = rleid(Speaker), Speaker) %>%
summarise(In = first(In), Out = last(Out), .groups = 'drop') %>%
select(names(df1))
-output
# A tibble: 15 x 3
# In Out Speaker
# <chr> <chr> <chr>
# 1 15:22.5 15:27.7 Y
# 2 15:31.2 15:36.9 Y T
# 3 15:38.1 15:45.2 Y
# 4 15:45.4 15:52.0 T
# 5 15:52.2 16:10.8 ATA
# 6 16:11.0 16:22.0 Y
# 7 16:23.4 16:36.8 T
# 8 16:36.9 16:37.8 Y
# 9 16:47.4 17:04.8 Y T
#10 17:06.0 17:08.3 Y
#11 17:11.2 17:17.0 T
#12 17:18.7 18:06.0 Y
#13 18:17.5 18:24.3 A T
#14 18:28.9 19:53.1 T
#15 20:01.6 20:19.1 A TY
Upvotes: 1