Joep_S
Joep_S

Reputation: 537

r: combine filter with n_distinct in data frame

Simple question. Considering the data frame below, I want to count distinct IDs: one for all records and one after filtering on status. However, the %>% doesn't seem to work here. I just want to have a single value as ouput (so for total this should be 10, for closed it should be 5), not a dataframe . Both # lines don't work

dat <- data.frame (ID = as.factor(c(1:10)),
                   status = as.factor(rep(c("open","closed"))))


total <- n_distinct(dat$ID)
#closed <- dat %>% filter(status == "closed") %>% n_distinct(dat$ID)
#closed <- dat %>% filter(status == "closed") %>% n_distinct(ID)

Upvotes: 0

Views: 693

Answers (2)

akrun
akrun

Reputation: 887571

An option with data.table

library(data.table)
setDT(dat)[status == "closed"][, .(n = uniqueN(ID))]

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389175

n_distinct expects a vector as input, you are passing a dataframe. You can do :

library(dplyr)

dat %>% 
  filter(status == "closed") %>%
  summarise(n = n_distinct(ID))

#  n
#1 5

Or without using filter :

dat %>% summarise(n = n_distinct(ID[status == "closed"]))

You can add %>% pull(n) to above if you want a vector back and not a dataframe.

Upvotes: 1

Related Questions