Reputation: 13
Very new to R and have been googling this question to death to no avail - any insight would be greatly appreciated!
I have data organized in the following way:
Individual | Behavior | start | end | duration |
---|---|---|---|---|
M1 | A | 0 | 1.505 | 1.505 |
M1 | B | 1.505 | 106.256 | 104.751 |
M1 | E | 3.266 | 67.255 | 63.989 |
M1 | E | 71.268 | 77.506 | 6.238 |
Duration, start time, and end time are all in seconds.
I want to sum the duration of each behavior in 1 minute bins - basically say this individual spent this amount of time doing behavior A in the first minute, this amount of time doing behavior B in the first minute, and so on for each behavior and each minute. Eventually, I would like to average time spent doing each behavior in each bin across experimental groups, and make a line graph for each experimental group, with duration on the y axis and 1 min bins on the x axis, with different series for each behavior.
<Edited to add: For the example above, the desired outcome would be:
M1 spent 1.505 seconds doing behavior A in bin 1 (0 seconds - 60 seconds)
M1 spent 0 seconds doing behavior A in bin 2 (>60 seconds - 120 seconds)
M1 spent 58.495 seconds doing behavior B in bin 1 (0 seconds - 60 seconds)
M1 spent 46.256 seconds doing behavior B in bin 2 (>60 seconds - 120 seconds)
M1 spent 56.734 seconds doing behavior E in bin 1 (0 seconds - 60 seconds)
M1 spent 13.493 seconds doing behavior E in bin 2 (>60 seconds - 120 seconds)
I'm not picky about how this data would be displayed, when I had binned previously it would just add a column labeled "bin" and assign the bin number for each row, but I don't think that will be possible now because behaviors can span multiple bins.>
The issue though, is that the way I have binned before (in excel and SPSS) is by defining what bin the start time falls into - but in this case for a lot of behaviors the duration is over 1 minute, so the start time and end time belong in different bins. Is there a way to parse this out in r?
Thanks!
Upvotes: 1
Views: 65
Reputation: 72984
You could try this function that uses modulo operators,
f <- \(x, y) {
a <- c(x, y) %/% 60 + 1 ## bin range
if (var(a) == 0) { ## if there's just 1 bin
s <- y - x
} else {
s <- numeric(diff(a)) ## create bin vector
s[seq_along(s) > a[1]] <- 60 ## fill used bins
s[a[1]] <- 60 - x %% 60 ## partly fill 1st used bin
s[a[2] - 1] <- y %% 60 ## partly fill last used bin
}
data.frame(bin=do.call('seq.int', c(as.list(range(a)), 1)), secs=s)
}
and gives:
> list(bins=Map(f, df$start, df$end))
$bins
$bins[[1]]
bin secs
1 1 1.505
$bins[[2]]
bin secs
1 1 46.256
2 2 46.256
[...]
To map it to individuals and behavior, we'll bind it to df, using list2DF
, split
it along the interactions and aggregate
, finally rbind
. I've added a second individual with rather unexpected values (including empty 1st bin) for more fun:
> list2DF(c(as.list(df), list(bins=Map(f, df$start, df$end)))) |>
+ split(~list(Behavior, Individual)) |>
+ lapply(\(.) {
+ if (nrow(.)) { ## circumvent interactions w/ zero rows
+ data.frame(`rownames<-`(unique(.[c("Individual", "Behavior")]), NULL),
+ aggregate(secs ~ bin, do.call('rbind', .$bins), sum))
+ }
+ }) |>
+ c(make.row.names=FALSE) |>
+ do.call(what='rbind')
Individual Behavior bin secs
1 M1 A 1 1.505
2 M1 B 1 46.256
3 M1 B 2 46.256
4 M1 E 1 7.255
5 M1 E 2 13.493
6 M2 A 1 1.700
7 M2 B 2 0.000
8 M2 B 3 19.300
9 M2 B 4 60.000
10 M2 B 5 60.000
11 M2 B 6 60.000
12 M2 B 7 45.500
13 M2 C 2 4.800
14 M2 E 1 6.500
15 M2 E 2 12.800
Data:
> dput(df)
structure(list(Individual = c("M1", "M1", "M1", "M1", "M2", "M2",
"M2", "M2", "M2"), Behavior = c("A", "B", "E", "E", "A", "B",
"C", "E", "E"), start = c(0, 1.505, 3.266, 71.268, 0, 100.7,
100.7, 4, 70.5), end = c(1.505, 106.256, 67.255, 77.506, 1.7,
405.5, 105.5, 66.5, 76.8), duration = c(1.505, 104.751, 63.989,
6.238, 1.7, 304.8, 4.8, 62.5, 6.3)), class = "data.frame", row.names = c(NA,
-9L))
Upvotes: 0
Reputation: 173
Other way if you want see intermediate files:
Instead of chaining everything into one complex pipeline, it uses separate mutate
and summarize
steps for clarity
# Create bins
bindf = df %>% rowwise() %>%
mutate(bins = list(seq(floor(start / 60), floor(end / 60), by = 1))) %>%
unnest(bins) %>%
mutate(secs_active = pmin(end, (bins + 1) * 60) - pmax(start, bins * 60)) %>%
filter(secs_active > 0)
# Summarize durations for each bin
df = bindf %>%group_by(Individual, Behavior, bins) %>%
summarize(secs_active = sum(secs_active), .groups = "drop")
df = df %>% rename(bin = bins)
df
Upvotes: 1
Reputation: 66520
One way in tidyverse below.
First, I reframe
the data giving a row for each relevant bin within each Indiv/Behavior combo. Then I find how much of the time was within that bin. Finally I sum up the totals by Indiv/Behav/Bin in case there are multiples.
library(tidyverse)
df |>
reframe(bin = seq(floor(start/60), floor(end/60), 1),
.by = c(Individual, Behavior, start, end, duration)) |>
mutate(secs_active = pmin(end, (bin+1)*60) - pmax(start, bin*60)) |>
summarize(secs_active = sum(secs_active), .by = c(Individual, Behavior, bin))
Result
Individual Behavior bin secs_active
1 M1 A 0 1.505
2 M1 B 0 58.495
3 M1 B 1 46.256
4 M1 E 0 56.734
5 M1 E 1 13.493
Upvotes: 2