Reputation: 462
I want to calculate the number of distinct number of colors for each ID value and I want the resulting dataframe to be the original dataframe + another column called count. From another post asking the same question, I got the following code, but this code doesn't seem to work for me
ID= c('A', 'A', 'A', 'B', 'B', 'B')
color=c('white', 'green', 'orange', 'white', 'green', 'green')
d = data.frame (ID, color)
d %>%
group_by(ID) %>%
mutate(count = n_distinct(color))
By running this code I got the following result:
ID color count
<fct> <fct> <int>
1 A white 3
2 A green 3
3 A orange 3
4 B white 3
5 B green 3
6 B green 3
when what I want is
ID color count
<fct> <fct> <int>
1 A white 3
2 A green 3
3 A orange 3
4 B white 2
5 B green 2
6 B green 2
Can someone tell me what I'm doing wrong or what is another way to do it using dplyr?
Upvotes: 2
Views: 7406
Reputation: 3043
Some notes:
# 1. Data set
df = data.frame (
id = c('A', 'A', 'A', 'B', 'B', 'B'),
color = c('white', 'green', 'orange', 'white', 'green', 'green'))
# 2. Desired result
df %>%
group_by(id) %>%
dplyr::mutate(count = n_distinct(color))
# 3. Result with a number of unique 'color's per 'id'
df %>%
group_by(id, color) %>%
dplyr::mutate(count = n()) %>%
unique()
Upvotes: 1