Reputation: 1304
I have a very basic question that I am a little struggling with, I have a panel large panel dataset that looks something like this:
df <- data.frame(id= c(1,1,1,2,2,2,3,3,3,4,4,4), time=c(1,2,3,1,2,3,1,2,3,1,2,3), x = c(0,0,0,0,1,1,0,0,1,0,1,2))
I would like to find a compact way to count how many times my x variable changes for every id. The final dataset should look something like this
df <- data.frame(id= c(1,1,1,2,2,2,3,3,3,4,4,4), time=c(1,2,3,1,2,3,1,2,3,1,2,3), x = c(0,0,0,0,1,1,0,0,1,0,1,2),count= c(0,0,0,1,1,1,1,1,1,2,2,2))
Ideally I would like to use dplyr
I was thinking i should do something like like
library(dplyr)
df <- df %>% group_by(id) %>% mutate(count=)
But I am not sure how to complete it because I don't know what kind of command I can use to count changes of scores.
Thanks a lot in advance for your help
Upvotes: 2
Views: 582
Reputation: 887251
We can use rle
with n_distinct
library(dplyr)
df %>%
group_by(id) %>%
mutate(count = n_distinct(rle(x)$values)-1)
# A tibble: 12 x 4
# Groups: id [4]
# id time x count
# <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0 0
# 2 1 2 0 0
# 3 1 3 0 0
# 4 2 1 0 1
# 5 2 2 1 1
# 6 2 3 1 1
# 7 3 1 0 1
# 8 3 2 0 1
# 9 3 3 1 1
#10 4 1 0 2
#11 4 2 1 2
#12 4 3 2 2
Or with data.table
library(data.table)
setDT(df)[, count := uniqueN(rleid(x)) - 1, id]
Upvotes: 0
Reputation: 34511
You can use the sum of the lagged difference of x not equal to zero:
library(dplyr)
df %>%
group_by(id) %>%
mutate(count = sum(diff(x) != 0))
id time x count
1 1 1 0 0
2 1 2 0 0
3 1 3 0 0
4 2 1 0 1
5 2 2 1 1
6 2 3 1 1
7 3 1 0 1
8 3 2 0 1
9 3 3 1 1
10 4 1 0 2
11 4 2 1 2
12 4 3 2 2
Upvotes: 3
Reputation: 389055
Using dplyr
, here is one way using lag
library(dplyr)
df %>%
group_by(id) %>%
mutate(count = length(unique(cumsum(x != lag(x, default = first(x))))) - 1)
# id time x count
# <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0 0
# 2 1 2 0 0
# 3 1 3 0 0
# 4 2 1 0 1
# 5 2 2 1 1
# 6 2 3 1 1
# 7 3 1 0 1
# 8 3 2 0 1
# 9 3 3 1 1
#10 4 1 0 2
#11 4 2 1 2
#12 4 3 2 2
which becomes little shorter with data.table::rleid
df %>%
group_by(id) %>%
mutate(count = length(unique(data.table::rleid(x))) - 1)
Upvotes: 0