Anna
Anna

Reputation: 895

Count n_distinct with a condition

I have the following dataframe:

df<-data.frame(Name= c(rep("A",3), rep("B",5)), Month = c(1,2,3,1,2,3,3,3), Volume = c(50,0,50,50,50,50,50,50))

and I would like to update a column "Count" to represent the number of unique Months per Name:

df<-df%>%
  group_by(Name) %>%
  mutate(Count = n_distinct(Month))

However, how can I add a filter such that I only count Months whose corresponding Value > 0? This is my desired output:

df<-data.frame(Name= c(rep("A",3), rep("B",5)), Month = c(1,2,3,1,2,3,3,3), Volume = c(50,0,50,50,50,50,50,50), Count = c(2,2,2,3,3,3,3,3))

Thank you!

Upvotes: 1

Views: 3720

Answers (3)

James Thomas Durant
James Thomas Durant

Reputation: 305

Try:

df%>%
  group_by(Name) %>%
  mutate(Count = n_unique(Month[Volume >0]))

Upvotes: 0

Andrew Gustar
Andrew Gustar

Reputation: 18425

You just need to add a condition to Month...

df <- df %>%
      group_by(Name) %>%
      mutate(Count = n_distinct(Month[Volume>0]))

df
# A tibble: 8 x 4
# Groups:   Name [2]
    Name Month Volume Count
  <fctr> <dbl>  <dbl> <int>
1      A     1     50     2
2      A     2      0     2
3      A     3     50     2
4      B     1     50     3
5      B     2     50     3
6      B     3     50     3
7      B     3     50     3
8      B     3     50     3

Upvotes: 1

bouncyball
bouncyball

Reputation: 10771

Rather than using the n_distinct function, we can use the duplicated function as well as including Volume > 0 in the logical expression:

df %>%
    group_by(Name) %>%
    mutate(Count = sum(!duplicated(Month) & Volume > 0)) # not duplicated, Volume > 0

    Name Month Volume Count
  <fctr> <dbl>  <dbl> <int>
1      A     1     50     2
2      A     2      0     2
3      A     3     50     2
4      B     1     50     3
5      B     2     50     3
6      B     3     50     3
7      B     3     50     3
8      B     3     50     3

Upvotes: 1

Related Questions