Reputation: 783
I have a dataset of 0s and 1s and amounts attached to them,
test = data.frame(seq = c(0,0,0,1,1,0,1,0,0), amount = c(91.0, 100.0, 0.0, 4.5, 5.5, 3.0, 23.0, 89.0, 56.0))
seq amount
1 0 91.0
2 0 100.0
3 0 0.0
4 1 4.5
5 1 5.5
6 0 3.0
7 1 23.0
8 0 89.0
9 0 56.0
An event is defined by the first 0 in a subsequence of 0s. I am interested in the number of zeros (count) in each event, as well as the amount sums.
For test above, we would have:
So, I would like to create the following table,
| Event | count | amount |
|------------|--------------|----------|
| 1 | 3 | 191.0 |
| 2 | 1 | 3.0 |
| 3 | 2 | 145.0 |
In an earlier post, @27 ϕ 9 sent me this great suggestion for the Event and count columns.
with(rle(test), data.frame(id = sequence(sum(values == 0)), count = lengths[values == 0]))
But how can I add the amount sums still using rle?
Upvotes: 0
Views: 180
Reputation: 39667
In case there is no urged need to use rle
aggregate
could be used like:
i <- which(test$seq == 0)
aggregate(cbind(count=1, amount=test$amount[i]),
list(Event=cumsum(c(1, diff(i) > 1))), sum)
# Event count amount
#1 1 3 191
#2 2 1 3
#3 3 2 145
Or rowsum
:
i <- which(test$seq == 0)
rowsum(cbind(count=1, amount=test$amount[i]), cumsum(c(1, diff(i) > 1)))
# count amount
#1 3 191
#2 1 3
#3 2 145
Upvotes: 0
Reputation: 388982
You may use data.table::rleid
to create group of consecutive runs, count the number of rows in each group and sum
the amount
.
library(dplyr)
res <- test %>%
group_by(Event = data.table::rleid(seq)) %>%
summarise(seq = first(seq),
count = n(),
amount = sum(amount))
res
# Event seq count amount
# <int> <dbl> <int> <dbl>
#1 1 0 3 191
#2 2 1 2 10
#3 3 0 1 3
#4 4 1 1 23
#5 5 0 2 145
If you are interested only in 0 sequence -
res %>%
filter(seq == 0) %>%
mutate(Event = row_number()) %>%
select(-seq)
# Event count amount
# <int> <int> <dbl>
#1 1 3 191
#2 2 1 3
#3 3 2 145
If you are interested in continuing with the rle
approach you can do -
with(rle(test$seq), data.frame(id = sequence(sum(values == 0)),
count = lengths[values == 0],
amount = tapply(test$amount, rep(seq_along(values), lengths), sum)[values == 0]))
# id count amount
#1 1 3 191
#3 2 1 3
#5 3 2 145
Upvotes: 1