Reputation: 13

Binning duration data in R with a start and end time for data that spans multiple bins

Very new to R and have been googling this question to death to no avail - any insight would be greatly appreciated!

I have data organized in the following way:

Individual	Behavior	start	end	duration
M1	A	0	1.505	1.505
M1	B	1.505	106.256	104.751
M1	E	3.266	67.255	63.989
M1	E	71.268	77.506	6.238

Duration, start time, and end time are all in seconds.

I want to sum the duration of each behavior in 1 minute bins - basically say this individual spent this amount of time doing behavior A in the first minute, this amount of time doing behavior B in the first minute, and so on for each behavior and each minute. Eventually, I would like to average time spent doing each behavior in each bin across experimental groups, and make a line graph for each experimental group, with duration on the y axis and 1 min bins on the x axis, with different series for each behavior.

<Edited to add: For the example above, the desired outcome would be:

M1 spent 1.505 seconds doing behavior A in bin 1 (0 seconds - 60 seconds)

M1 spent 0 seconds doing behavior A in bin 2 (>60 seconds - 120 seconds)

M1 spent 58.495 seconds doing behavior B in bin 1 (0 seconds - 60 seconds)

M1 spent 46.256 seconds doing behavior B in bin 2 (>60 seconds - 120 seconds)

M1 spent 56.734 seconds doing behavior E in bin 1 (0 seconds - 60 seconds)

M1 spent 13.493 seconds doing behavior E in bin 2 (>60 seconds - 120 seconds)

I'm not picky about how this data would be displayed, when I had binned previously it would just add a column labeled "bin" and assign the bin number for each row, but I don't think that will be possible now because behaviors can span multiple bins.>

The issue though, is that the way I have binned before (in excel and SPSS) is by defining what bin the start time falls into - but in this case for a lot of behaviors the duration is over 1 minute, so the start time and end time belong in different bins. Is there a way to parse this out in r?

Thanks!

Upvotes: 1

Answers (3)

jay.sf

Reputation: 72984

You could try this function that uses modulo operators,

f <- \(x, y) {
  a <- c(x, y) %/% 60 + 1  ## bin range
  if (var(a) == 0) {       ## if there's just 1 bin
    s <- y - x
  } else {
    s <- numeric(diff(a))         ## create bin vector
    s[seq_along(s) > a[1]] <- 60  ## fill used bins
    s[a[1]] <- 60 - x %% 60  ## partly fill 1st used bin
    s[a[2] - 1] <- y %% 60   ## partly fill last used bin
  }
  data.frame(bin=do.call('seq.int', c(as.list(range(a)), 1)), secs=s)
}

and gives:

> list(bins=Map(f, df$start, df$end))
$bins
$bins[[1]]
  bin  secs
1   1 1.505

$bins[[2]]
  bin   secs
1   1 46.256
2   2 46.256
[...]

To map it to individuals and behavior, we'll bind it to df, using list2DF, split it along the interactions and aggregate, finally rbind. I've added a second individual with rather unexpected values (including empty 1st bin) for more fun:

> list2DF(c(as.list(df), list(bins=Map(f, df$start, df$end)))) |> 
+   split(~list(Behavior, Individual)) |> 
+   lapply(\(.) {
+     if (nrow(.)) {  ## circumvent interactions w/ zero rows
+       data.frame(`rownames<-`(unique(.[c("Individual", "Behavior")]), NULL),
+                  aggregate(secs ~ bin, do.call('rbind', .$bins), sum))
+     }
+   }) |> 
+   c(make.row.names=FALSE) |> 
+   do.call(what='rbind')
   Individual Behavior bin   secs
1          M1        A   1  1.505
2          M1        B   1 46.256
3          M1        B   2 46.256
4          M1        E   1  7.255
5          M1        E   2 13.493
6          M2        A   1  1.700
7          M2        B   2  0.000
8          M2        B   3 19.300
9          M2        B   4 60.000
10         M2        B   5 60.000
11         M2        B   6 60.000
12         M2        B   7 45.500
13         M2        C   2  4.800
14         M2        E   1  6.500
15         M2        E   2 12.800

Data:

> dput(df)
structure(list(Individual = c("M1", "M1", "M1", "M1", "M2", "M2", 
"M2", "M2", "M2"), Behavior = c("A", "B", "E", "E", "A", "B", 
"C", "E", "E"), start = c(0, 1.505, 3.266, 71.268, 0, 100.7, 
100.7, 4, 70.5), end = c(1.505, 106.256, 67.255, 77.506, 1.7, 
405.5, 105.5, 66.5, 76.8), duration = c(1.505, 104.751, 63.989, 
6.238, 1.7, 304.8, 4.8, 62.5, 6.3)), class = "data.frame", row.names = c(NA, 
-9L))

Upvotes: 0

jynxmaze

Reputation: 173

Other way if you want see intermediate files: Instead of chaining everything into one complex pipeline, it uses separate mutate and summarize steps for clarity

# Create bins
bindf = df %>% rowwise() %>%
  mutate(bins = list(seq(floor(start / 60), floor(end / 60), by = 1))) %>%
  unnest(bins) %>%
  mutate(secs_active = pmin(end, (bins + 1) * 60) - pmax(start, bins * 60)) %>%
  filter(secs_active > 0)
# Summarize durations for each bin
df = bindf %>%group_by(Individual, Behavior, bins) %>%
     summarize(secs_active = sum(secs_active), .groups = "drop")
df = df %>% rename(bin = bins)
df

Upvotes: 1

Jon Spring

Reputation: 66520

One way in tidyverse below.

First, I reframe the data giving a row for each relevant bin within each Indiv/Behavior combo. Then I find how much of the time was within that bin. Finally I sum up the totals by Indiv/Behav/Bin in case there are multiples.

library(tidyverse)
df |>
  reframe(bin = seq(floor(start/60), floor(end/60), 1),
          .by = c(Individual, Behavior, start, end, duration)) |>
  mutate(secs_active = pmin(end, (bin+1)*60) - pmax(start, bin*60)) |>
  summarize(secs_active = sum(secs_active), .by = c(Individual, Behavior, bin))

Result

  Individual Behavior bin secs_active
1         M1        A   0       1.505
2         M1        B   0      58.495
3         M1        B   1      46.256
4         M1        E   0      56.734
5         M1        E   1      13.493

Upvotes: 2

Binning duration data in R with a start and end time for data that spans multiple bins

Answers (3)

Related Questions