count observations by group based on conditions of 2 variables in R

Question

This is the shor example data. Original data has many columns and rows.

head(df, 15)

    ID   col1   col2
1   1  green yellow
2   1  green   blue
3   1  green  green
4   2 yellow   blue
5   2 yellow yellow
6   2 yellow   blue
7   3 yellow yellow
8   3 yellow yellow
9   3 yellow   blue
10  4   blue yellow
11  4   blue yellow
12  4   blue yellow
13  5 yellow yellow
14  5 yellow   blue
15  5 yellow yellow

what I want to count how many different colors in col2 including the color of col1. For ex: for the ID=4, there is only 1 color in col2. if we include col1, there are 2 different colors. So output should be 2 and so on.

I tried in this way, but it doesn't give me my desired output: ID = 4 turns into 0 which is not I want. So how could I tell R to count them including color in col1?

out <- df %>%
  group_by(ID) %>%
  mutate(N = ifelse(col1 != col2, 1, 0))

My desired output is something like this:

ID  col1    count
1   green   3
2   yellow  2
3   yellow  2
4   blue    2
5   yellow  2

tmfmnk · Accepted Answer

You can do:

df %>%
 group_by(ID, col1) %>%
 summarise(count = n_distinct(col2))

     ID col1   count
     
1     1 green      3
2     2 yellow     2
3     3 yellow     2
4     4 blue       1
5     5 yellow     2

Or even:

df %>%
 group_by(ID, col1) %>%
 summarise_all(n_distinct)

     ID col1    col2
     
1     1 green      3
2     2 yellow     2
3     3 yellow     2
4     4 blue       1
5     5 yellow     2

To group by every three rows:

df %>%
 group_by(group = gl(n()/3, 3), col1) %>%
 summarise(count = n_distinct(col2))

count observations by group based on conditions of 2 variables in R

Answers (1)

Related Questions