Reputation: 895
I have the following dataframe:
df<-data.frame(Name= c(rep("A",3), rep("B",5)), Month = c(1,2,3,1,2,3,3,3), Volume = c(50,0,50,50,50,50,50,50))
and I would like to update a column "Count" to represent the number of unique Months per Name:
df<-df%>%
group_by(Name) %>%
mutate(Count = n_distinct(Month))
However, how can I add a filter such that I only count Months whose corresponding Value > 0? This is my desired output:
df<-data.frame(Name= c(rep("A",3), rep("B",5)), Month = c(1,2,3,1,2,3,3,3), Volume = c(50,0,50,50,50,50,50,50), Count = c(2,2,2,3,3,3,3,3))
Thank you!
Upvotes: 1
Views: 3720
Reputation: 305
Try:
df%>%
group_by(Name) %>%
mutate(Count = n_unique(Month[Volume >0]))
Upvotes: 0
Reputation: 18425
You just need to add a condition to Month
...
df <- df %>%
group_by(Name) %>%
mutate(Count = n_distinct(Month[Volume>0]))
df
# A tibble: 8 x 4
# Groups: Name [2]
Name Month Volume Count
<fctr> <dbl> <dbl> <int>
1 A 1 50 2
2 A 2 0 2
3 A 3 50 2
4 B 1 50 3
5 B 2 50 3
6 B 3 50 3
7 B 3 50 3
8 B 3 50 3
Upvotes: 1
Reputation: 10771
Rather than using the n_distinct
function, we can use the duplicated
function as well as including Volume > 0
in the logical expression:
df %>%
group_by(Name) %>%
mutate(Count = sum(!duplicated(Month) & Volume > 0)) # not duplicated, Volume > 0
Name Month Volume Count
<fctr> <dbl> <dbl> <int>
1 A 1 50 2
2 A 2 0 2
3 A 3 50 2
4 B 1 50 3
5 B 2 50 3
6 B 3 50 3
7 B 3 50 3
8 B 3 50 3
Upvotes: 1