Reputation: 113
I want to use behavioural data to calculate the number of items caught. This is my example data:
df <- data.frame(id = as.factor(c(51,51,51,51,51,51,51,52,52,52,52,52,52)),
type = c("(K)","(K)","(K)","(K)","","","","(K)","(K)","","(K)","","(K)"))
I would like to count each of my "K"'s based on if they are consecutive or not. If consecutive, the string should count as one. if there is a gap between, they should both count as one.. so final tally will be 2.
Hope that makes sense... for the example above, I would like my final output data to look like this
id type tally
1 51 (K) 1
2 52 (K) 3
I thought aggregate might do this, however it counts the total number in a column so for 51 tally=4 rather than 1
Any help would be appreciated
Thanks Grace
Upvotes: 2
Views: 172
Reputation: 1709
The rle
command in base R would be useful.
temp<- tapply(df$type, df$id, function(x) rle(x == "(K)"))
df.new<- data.frame(id = names(temp),
tally = unlist(lapply(temp, function(x) sum(x$values))))
Upvotes: 3
Reputation: 887571
We can try with rleid
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'id', find the run-length-id of 'type', grouped by 'id', and 'type', get the length
of the unique
elements of 'val' that are not a blank
library(data.table)
setDT(df)[, val := rleid(type), id][type!="", .(tally = uniqueN(val)), .(id, type)]
# id type tally
#1: 51 (K) 1
#2: 52 (K) 3
Or we can use tidyverse
library(tidyverse)
df %>%
mutate(val = cumsum(type != lag(type, default = type[1]))) %>%
group_by(id) %>%
filter(type!="") %>%
summarise(type = first(type), tally= n_distinct(val))
# A tibble: 2 × 3
# id type tally
# <fctr> <fctr> <int>
#1 51 (K) 1
#2 52 (K) 3
Upvotes: 3
Reputation: 32548
In base R, you could do it with rle
. First split df
by id
and then for each subgroup count the number of times sequences of "(K)"
.
sapply(split(df, df$id), function(a)
length(with(rle(as.character(a$type)), lengths[values == "(K)"])))
#51 52
# 1 3
Upvotes: 4