Reputation: 5897
I am using the R programming language. I have a data frame (my_file) with 2 columns: my_date (e.g. 2000-01-15, in factor format) and "blood_type" (also in factor format). I am trying to use the dplyr library to produce distinct counts by group (by month).
I figured out how to make non-distinct counts:
library(dplyr)
new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n())
But this does not work for distinct counts:
new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n_distinct())
Evaluation Error : Need at least one column for 'n_distinct()'
I tried to explicitly reference the column, but this produces an empty file:
new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n_distinct(my_file$blood_type))
Can someone please show me what I am doing wrong?
Thanks
Upvotes: 0
Views: 106
Reputation: 886938
Using data.table
library(data.table)
setDT(my_file)[, .(count = uniqueN(blood_type),
.(month = format(as.IDate(my_date), '%Y-%m'))]
Upvotes: 1
Reputation: 388797
If you want to count distinct blood_type
for each month don't include it in group_by
. Try :
library(dplyr)
new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(month = format(date, "%Y-%m")) %>%
summarise(count = n_distinct(blood_type))
Upvotes: 1