Amazonian
Amazonian

Reputation: 462

R count the number of distinct number of values within a group using dplyr

I want to calculate the number of distinct number of colors for each ID value and I want the resulting dataframe to be the original dataframe + another column called count. From another post asking the same question, I got the following code, but this code doesn't seem to work for me

    ID= c('A', 'A', 'A', 'B', 'B', 'B')
    color=c('white', 'green', 'orange', 'white', 'green', 'green')

    d = data.frame (ID, color)
    d %>%
      group_by(ID) %>%
      mutate(count = n_distinct(color))

By running this code I got the following result:

      ID    color  count
      <fct> <fct>  <int>
      1 A     white      3
      2 A     green      3
      3 A     orange     3
      4 B     white      3
      5 B     green      3
      6 B     green      3

when what I want is

      ID    color  count
      <fct> <fct>  <int>
      1 A     white      3
      2 A     green      3
      3 A     orange     3
      4 B     white      2
      5 B     green      2
      6 B     green      2

Can someone tell me what I'm doing wrong or what is another way to do it using dplyr?

Upvotes: 2

Views: 7406

Answers (1)

Andrii
Andrii

Reputation: 3043

Some notes:

# 1. Data set
df = data.frame (
  id = c('A', 'A', 'A', 'B', 'B', 'B'),
  color = c('white', 'green', 'orange', 'white', 'green', 'green'))

# 2. Desired result
df %>%
  group_by(id) %>%
  dplyr::mutate(count = n_distinct(color))

# 3. Result with a number of unique 'color's per 'id'
df %>%
  group_by(id, color) %>%
  dplyr::mutate(count = n()) %>% 
  unique()

Upvotes: 1

Related Questions