Reputation: 223
I have an SQL query I'm trying to translate into R:
SELECT t."col1", t."col2", count(DISTINCT t."date")
FROM t
GROUP BY t."col1", t."col2"
where the R data frame looks something like this:
col1 col2 date
a 1 2016-01-09
a 1 2016-01-02
a 1 2016-01-02
b 1 2016-01-07
b 1 2016-01-03
b 1 2016-01-02
b 1 2016-01-07
b 2 2016-01-11
The actual output should be something like this:
col1 col2 count
a 1 2
b 1 3
b 2 1
I've looked at the count method in the plyr package... but it doesn't take into account the number of distinct dates.
Runinng this
count(t, c("col1", "col2"))
produces this:
col1 col2 count
a 1 3
b 1 4
b 2 1
How can I replicate the behavior of the SQL query in R?
Upvotes: 0
Views: 146
Reputation: 378
If your data frame is named df
, then:
aggregate(df, date ~ ., function(x) length(unique(x)))
OUTPUT
col1 col2 date
1 a 1 2
2 b 1 3
3 b 2 1
Upvotes: 0
Reputation: 3557
Assuming you have the atomic level data in a data frame called df:
library(dplyr)
df %>%
group_by(col_1, col_2) %>%
summarise(distinct_ct = n_distinct(date))
Upvotes: 3
Reputation: 887048
Here is an option using data.table
library(data.table)
setDT(df)[, .(distinct_ct = uniqueN(date)), by = .(col_1, col_2)]
Upvotes: 2