Tony Atwater
Tony Atwater

Reputation: 15

R: Combine consecutive rows based on the previous row

I am working on a speaking time dataset (time series) using R. This data shows the In and Out time of different speakers. I want to (1) combine consecutive rows that belong to the same speaker by time order and (2) keep the time in the In column of the first row and the time in the last Out column in the new merged row. Could someone guide me how to solve this problem? Many thanks!

The original data:

structure(list(In = c("15:22.5", "15:31.2", "15:38.1", "15:45.4", 
"15:52.2", "16:11.0", "16:23.4", "16:35.3", "16:36.9", "16:47.4", 
"17:06.0", "17:11.2", "17:18.7", "17:26.7", "17:34.6", "18:05.0", 
"18:17.5", "18:28.9", "18:32.4", "19:00.4", "19:41.3", "20:01.6"
), Out = c("15:27.7", "15:36.9", "15:45.2", "15:52.0", "16:10.8", 
"16:22.0", "16:35.0", "16:36.8", "16:37.8", "17:04.8", "17:08.3", 
"17:17.0", "17:23.8", "17:27.2", "18:04.3", "18:06.0", "18:24.3", 
"18:31.8", "18:59.1", "19:40.2", "19:53.1", "20:19.1"), Speaker = c("Y", 
"Y T", "Y", "T", "ATA", "Y", "T", "T", "Y", "Y T", "Y", "T", 
"Y", "Y", "Y", "Y", "A T", "T", "T", "T", "T", "A TY")), class = "data.frame", row.names = c(NA, 
-22L))

Expected Output:

structure(list(In = c("15:22.5", "15:31.2", "15:38.1", "15:45.4", 
"15:52.2", "16:11.0", "16:23.4", "16:36.9", "16:47.4", "17:06.0", 
"17:11.2", "17:18.7", "18:17.5", "18:28.9", "20:01.6"), Out = c("15:27.7", 
"15:36.9", "15:45.2", "15:52.0", "16:10.8", "16:22.0", "16:36.8", 
"16:37.8", "17:04.8", "17:08.3", "17:17.0", "18:06.0", "18:24.3", 
"19:53.1", "20:19.1"), Speaker = c("Y", "Y T", "Y", "T", "ATA", 
"Y", "T", "Y", "Y T", "Y", "T", "Y", "A T", "T", "A TY")), class = "data.frame", row.names = c(NA, 
-15L))

Upvotes: 1

Views: 365

Answers (1)

akrun
akrun

Reputation: 886938

We can create a grouping variable with rleid from data.table i.e. based on the similar adjacent values of 'Speaker', then summarise by taking the first value of 'In' and last of 'Out' columns

library(dplyr)
library(data.table)
df1 %>% 
    group_by(grp = rleid(Speaker), Speaker) %>% 
    summarise(In = first(In), Out = last(Out), .groups = 'drop') %>%
    select(names(df1))

-output

# A tibble: 15 x 3
#   In      Out     Speaker
#   <chr>   <chr>   <chr>  
# 1 15:22.5 15:27.7 Y      
# 2 15:31.2 15:36.9 Y T    
# 3 15:38.1 15:45.2 Y      
# 4 15:45.4 15:52.0 T      
# 5 15:52.2 16:10.8 ATA    
# 6 16:11.0 16:22.0 Y      
# 7 16:23.4 16:36.8 T      
# 8 16:36.9 16:37.8 Y      
# 9 16:47.4 17:04.8 Y T    
#10 17:06.0 17:08.3 Y      
#11 17:11.2 17:17.0 T      
#12 17:18.7 18:06.0 Y      
#13 18:17.5 18:24.3 A T    
#14 18:28.9 19:53.1 T      
#15 20:01.6 20:19.1 A TY   

Upvotes: 1

Related Questions