HollowBastion
HollowBastion

Reputation: 223

How to group by and count distinct occurences of values in a particular column in R?

I have an SQL query I'm trying to translate into R:

SELECT t."col1", t."col2", count(DISTINCT t."date")
FROM t
GROUP BY t."col1", t."col2"

where the R data frame looks something like this:

col1 col2 date
a 1 2016-01-09
a 1 2016-01-02
a 1 2016-01-02
b 1 2016-01-07
b 1 2016-01-03
b 1 2016-01-02
b 1 2016-01-07
b 2 2016-01-11

The actual output should be something like this:

col1 col2 count
a 1 2
b 1 3
b 2 1

I've looked at the count method in the plyr package... but it doesn't take into account the number of distinct dates.

Runinng this

count(t, c("col1", "col2"))

produces this:

col1 col2 count
a 1 3
b 1 4
b 2 1

How can I replicate the behavior of the SQL query in R?

Upvotes: 0

Views: 146

Answers (3)

Alan Gómez
Alan Gómez

Reputation: 378

If your data frame is named df, then:

aggregate(df, date ~ ., function(x) length(unique(x)))

OUTPUT

  col1 col2 date
1    a    1    2
2    b    1    3
3    b    2    1

Upvotes: 0

Jacob
Jacob

Reputation: 3557

Assuming you have the atomic level data in a data frame called df:

library(dplyr)
df %>% 
  group_by(col_1, col_2) %>%
  summarise(distinct_ct = n_distinct(date))

Upvotes: 3

akrun
akrun

Reputation: 887048

Here is an option using data.table

library(data.table)
setDT(df)[, .(distinct_ct = uniqueN(date)), by = .(col_1, col_2)]

Upvotes: 2

Related Questions