How to add rows when time interval straddles minute mark

Question

I have data with speech events timed from start to end in milliseconds:

df <- data.frame(
  speaker = c(NA, "A", NA, "B", NA, "C", NA, "C"),
  start = c(0,20000,35000,65000,80000,100000,110000,140000),
  end = c(20000,35000,65000,80000,100000,110000,140000,195000)
)

I need to add rows when start and end times of a row straddle a full-minute mark, while clipping the end time of the straddling row to the exact full-minute mark (60000, 120000etc.) and assigning the 'rest' to the end value in the added row. The desired output would be:

  speaker  start    end
1          0  20000
2       A  20000  35000
3      35000  60000   # clipped end time
4      60000  65000   # added row
5       B  65000  80000
6      80000 100000
7       C 100000 110000
8     110000 120000   # clipped end time
9     120000 140000   # added row
10      C 140000 120000   # clipped end time
11      C 120000 195000   # added row

EDIT:

Here's my (long-winded) solution:

Step 1: Create a dataframe in which the minute-straddling row is flagged:

df0 <- df %>%
  mutate(
    minute_start = as.integer(start/60000),
    minute_end = as.integer(end/60000),
    straddler = minute_end > minute_start)

Step 2: Create another dataframe subsetted on the flagged rows, where the end value is mutated to the exact minute mark:

df1 <- df0 %>%
  filter(straddler=="TRUE") %>%
  mutate(end = minute_end*60000)

Step 3: Create another dataframe subsetted on the flagged rows, where the start value is mutated to the exact minute mark:

df2 <- df0 %>%
  filter(straddler=="TRUE") %>%
  mutate(start = minute_end*60000)

Step 4: Create another dataframe without the flagged rows:

df3 <- df0 %>%
  filter(!straddler == "TRUE")

Step 5: Finally, rbind the the dataframes, arrangeing them by start:

arrange(rbind(df1, df2, df3), start)
   speaker  start    end minute_start minute_end straddler
1           0  20000            0          0     FALSE
2        A  20000  35000            0          0     FALSE
3       35000  60000            0          1      TRUE
4       60000  65000            0          1      TRUE
5        B  65000  80000            1          1     FALSE
6       80000 100000            1          1     FALSE
7        C 100000 110000            1          1     FALSE
8      110000 120000            1          2      TRUE
9      120000 140000            1          2      TRUE
10       C 140000 180000            2          3      TRUE
11       C 180000 195000            2          3      TRUE

If anybody knows how to improve this, it'd be much appreciated!

ktiu · Accepted Answer

Interesting question! I came up with this:

library(dplyr)
library(purrr)
library(tibble)

mutate(df, across(-speaker, ~ floor(.x / 60000) + 1, .names = "mnt_{.col}")) %>%
  pmap(\(...) {
    with(list(...), if (mnt_start == mnt_end) tibble(speaker, start, end)
                    else tibble(speaker = rep(speaker, 2),
                                start = c(start, mnt_start * 60000),
                                end = c(mnt_start * 60000, end)))
  }) %>%
  bind_rows()

Returning:

# A tibble: 11 x 3
   speaker  start    end
         
 1 NA           0  20000
 2 A        20000  35000
 3 NA       35000  60000
 4 NA       60000  65000
 5 B        65000  80000
 6 NA       80000 100000
 7 C       100000 110000
 8 NA      110000 120000
 9 NA      120000 140000
10 C       140000 180000
11 C       180000 195000

How to add rows when time interval straddles minute mark

Answers (2)

Related Questions