Reputation: 2443
I have a data frame with many categorical columns. I would like to count the number of distinct categories not equal to "bla". So for example:
> d1
# A tibble: 5 x 2
x y
<chr> <chr>
1 yellow A
2 green A
3 green bla
4 blue B
5 bla bla
How can I modify dplyr's
d1 %>% summarise_all(n_distinct)
to exclude the category "bla"? In this case, the answer should be 3 for column x
and 2 for column y
.
Upvotes: 2
Views: 69
Reputation: 388982
We can use filter_all
to filter
the rows from all the columns and then use n_distinct
to get length of unique values.
library(dplyr)
d1 %>%
filter_all(all_vars(. != "bla")) %>%
summarise_all(n_distinct)
# x y
#1 3 2
Upvotes: 3
Reputation: 56149
Using base::lengths():
lengths(lapply(d1, function(i) unique(i[ i != "bla" ])))
# x y
# 3 2
Upvotes: 1
Reputation: 320
Using data table
library(data.table)
d1 <- data.table(d1)
d1[!y %like% 'bla', .(count = .N, distinct = uniqueN(x)), by = .(y)]
Upvotes: 0