Alex
Alex

Reputation: 1304

counting how many times a variable changes score by group

I have a very basic question that I am a little struggling with, I have a panel large panel dataset that looks something like this:

df <- data.frame(id= c(1,1,1,2,2,2,3,3,3,4,4,4), time=c(1,2,3,1,2,3,1,2,3,1,2,3), x = c(0,0,0,0,1,1,0,0,1,0,1,2))

I would like to find a compact way to count how many times my x variable changes for every id. The final dataset should look something like this

df <- data.frame(id= c(1,1,1,2,2,2,3,3,3,4,4,4), time=c(1,2,3,1,2,3,1,2,3,1,2,3), x = c(0,0,0,0,1,1,0,0,1,0,1,2),count= c(0,0,0,1,1,1,1,1,1,2,2,2))

Ideally I would like to use dplyr

I was thinking i should do something like like

library(dplyr)
df <- df %>% group_by(id) %>% mutate(count=)

But I am not sure how to complete it because I don't know what kind of command I can use to count changes of scores.

Thanks a lot in advance for your help

Upvotes: 2

Views: 582

Answers (3)

akrun
akrun

Reputation: 887251

We can use rle with n_distinct

library(dplyr)
df %>%
   group_by(id) %>% 
   mutate(count = n_distinct(rle(x)$values)-1)
# A tibble: 12 x 4
# Groups:   id [4]
#      id  time     x count
#   <dbl> <dbl> <dbl> <dbl>
# 1     1     1     0     0
# 2     1     2     0     0
# 3     1     3     0     0
# 4     2     1     0     1
# 5     2     2     1     1
# 6     2     3     1     1
# 7     3     1     0     1
# 8     3     2     0     1
# 9     3     3     1     1
#10     4     1     0     2
#11     4     2     1     2
#12     4     3     2     2

Or with data.table

library(data.table)
setDT(df)[, count := uniqueN(rleid(x)) - 1, id]

Upvotes: 0

lroha
lroha

Reputation: 34511

You can use the sum of the lagged difference of x not equal to zero:

library(dplyr)

 df %>% 
   group_by(id) %>%
   mutate(count = sum(diff(x) != 0))

   id time x count
1   1    1 0     0
2   1    2 0     0
3   1    3 0     0
4   2    1 0     1
5   2    2 1     1
6   2    3 1     1
7   3    1 0     1
8   3    2 0     1
9   3    3 1     1
10  4    1 0     2
11  4    2 1     2
12  4    3 2     2

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389055

Using dplyr, here is one way using lag

library(dplyr)
df %>%
  group_by(id) %>%
  mutate(count = length(unique(cumsum(x != lag(x, default = first(x))))) - 1)


#     id  time     x count
#   <dbl> <dbl> <dbl> <dbl>
# 1     1     1     0     0
# 2     1     2     0     0
# 3     1     3     0     0
# 4     2     1     0     1
# 5     2     2     1     1
# 6     2     3     1     1
# 7     3     1     0     1
# 8     3     2     0     1
# 9     3     3     1     1
#10     4     1     0     2
#11     4     2     1     2
#12     4     3     2     2

which becomes little shorter with data.table::rleid

df %>%
  group_by(id) %>%
  mutate(count = length(unique(data.table::rleid(x))) - 1)

Upvotes: 0

Related Questions