Reputation: 15
I've got a large dataset (dt) which has a time columnn (where time is in seconds) and a column that records a 1 when some other variables meet a certain value and a 0 when they don't, e.g:
time (s) var
0 1
0.3 1
0.6 0
0.9 0
1.2 1
1.5 1
1.8 0
Part 1) what I want to do is count each time 1 is repeated as a unique occurrence (more than twice) in a count column which would look like this:
time (s) var count
0 1 1
0.3 1 1
0.6 0 0
0.9 0 0
1.2 1 2
1.5 1 2
1.8 0 0
where each occurrence in the same bout would have the same number and where 0 occurs is there is no counting.
For Part 1 I have this so far, but I would like it to print each unique occurrence as a count in a column which it doesn't do:
with(rle(dt$var), sum(lengths[values] > 2))
Part 2) I also want to know the length of time each occurrence lasts. (I also have a replicate column which has value of 1 for each row)
I tried this to calculate Part 2 but it doesn't work...
var_time <- dt %>%
group_by(replicate) %>%
mutate(var_time = cumsum(var != lag(var, default = ""))) %>%
group_by(var, time) %>%
summarise(start = min(time),
end = max(time),
var = sum(var))
Upvotes: 1
Views: 1696
Reputation: 887118
An option with data.table
library(data.table)
setDT(df1)[, count := rleid(var) * var][count != 0,
count := match(count, unique(count))][]
# time var count
#1: 0.0 1 1
#2: 0.3 1 1
#3: 0.6 0 0
#4: 0.9 0 0
#5: 1.2 1 2
#6: 1.5 1 2
#7: 1.8 0 0
Or with base R
using rle/inverse.rle
df1$count <- inverse.rle(within.list(rle(df1$var),
values[as.logical(values)] <- seq_along(values[as.logical(values)])))
df1 <- data.frame(time = c(0, 0.3, 0.6, 0.9, 1.2, 1.5, 1.8), var = c(1, 1, 0, 0, 1, 1, 0))
Upvotes: 1
Reputation: 388982
You can use rle
to get answer to first part.
dt$count <- with(rle(dt$var), rep(values * cumsum(values & lengths >= 2),lengths))
dt
# time var count
#1 0.0 1 1
#2 0.3 1 1
#3 0.6 0 0
#4 0.9 0 0
#5 1.2 1 2
#6 1.5 1 2
#7 1.8 0 0
Upvotes: 2