stats_noob
stats_noob

Reputation: 5897

Evaluation Error : Need at least one column for 'n_distinct()'

I am using the R programming language. I have a data frame (my_file) with 2 columns: my_date (e.g. 2000-01-15, in factor format) and "blood_type" (also in factor format). I am trying to use the dplyr library to produce distinct counts by group (by month).

I figured out how to make non-distinct counts:

library(dplyr)

new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n())

But this does not work for distinct counts:

new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n_distinct())

Evaluation Error : Need at least one column for 'n_distinct()'

I tried to explicitly reference the column, but this produces an empty file:

new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n_distinct(my_file$blood_type))

Can someone please show me what I am doing wrong?

Thanks

Upvotes: 0

Views: 106

Answers (2)

akrun
akrun

Reputation: 886938

Using data.table

library(data.table)
setDT(my_file)[, .(count = uniqueN(blood_type), 
        .(month = format(as.IDate(my_date), '%Y-%m'))]

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388797

If you want to count distinct blood_type for each month don't include it in group_by. Try :

library(dplyr)

new_file <- my_file %>%
  mutate(date = as.Date(my_date)) %>%
  group_by(month = format(date, "%Y-%m")) %>%
  summarise(count = n_distinct(blood_type))

Upvotes: 1

Related Questions