Reputation: 1084
Is it possible to count distinct values in conjunction with window functions like OVER(PARTITION BY id)
? Currently my query is as follows:
SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
congestion.id_element,
ROW_NUMBER() OVER(
PARTITION BY congestion.id_element
ORDER BY congestion.date),
COUNT(DISTINCT congestion.week_nb) OVER(
PARTITION BY congestion.id_element
) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date
However, when I try to execute the query I get the following error:
"COUNT(DISTINCT": "DISTINCT is not implemented for window functions"
Upvotes: 32
Views: 65173
Reputation: 24
Yes, "DISTINCT is not implemented for window functions".
Another solution is to do DENSE_RANK() OVER(PARTITION BY ORDER BY)
and MAX() OVER(PARTITION BY)
instead.
e.g: for the purpose COUNT(DISTINCT week_nb) OVER(PARTITION BY id_element) week_count
:
SELECT s.date,
s.week_nb,
s.id_congestion,
s.id_element,
MAX(s.week_dense_rank) OVER(PARTITION BY s.id_element) week_count
FROM
(
SELECT date,
week_nb,
id_congestion,
id_element,
DENSE_RANK() OVER(PARTITION BY id_element ORDER BY week_nb) week_dense_rank
FROM congestion
WHERE date >= '2014.01.01'
AND date <= '2014.12.31'
) s
Upvotes: 0
Reputation: 69
If you are counting distinct numbers, you can use other aggregate functions to acheive the same effect, like so.
select
initial.id,
initial.val,
joined.id,
array_length(uniq(sort(array_agg(joined.some_number) over (partition by initial.id))), 1) as distinct_count
from
(values (1,'a'), (2,'b'), (3,'c')) initial(id, val)
left join (values (1, 1),
(1, 1),
(1, 3),
(2, 2),
(2, 2),
(3, 3),
(3, 3),
(3, 3),
(3, 4)) joined(id, some_number) on joined.id = initial.id
;
id val id distinct_count
1 a 1 2
1 a 1 2
1 a 1 2
2 b 2 1
2 b 2 1
3 c 3 2
3 c 3 2
3 c 3 2
3 c 3 2
If you get an error message saying uniq
doesn't exist, it's in the intarray extension. Try running CREATE EXTENSION intarray
first and then try again. If you are still having trouble, search the Postgresql docs for info on the intarray extension. "[The] module is considered 'trusted', that is, it can be installed by non-superusers who have CREATE privilege on the current database."
Upvotes: 1
Reputation: 1269443
I find that the easiest way is to use a subquery/CTE and conditional aggregation:
SELECT
c.date,
c.week_nb,
c.id_congestion,
c.id_element,
ROW_NUMBER() OVER (PARTITION BY c.id_element ORDER BY c.date),
(
CASE WHEN seqnum = 1 THEN
1
ELSE
0
END) AS week_count
FROM (
SELECT
c.*,
ROW_NUMBER() OVER (PARTITION BY c.congestion.id_element, c.week_nb ORDER BY c.date) AS seqnum
FROM
congestion c) c
WHERE
c.date >= '2014.01.01'
AND c.date <= '2014.12.31'
ORDER BY
id_element,
date
Upvotes: 2
Reputation: 8796
Since this is the first result that pops up from Google, I'll add this reproducible example, similar to Gordon's answer:
Let's first start with creating a sample table:
WITH test as
(
SELECT *
FROM (VALUES
(1, 'A'),
(1, 'A'),
(2, 'B'),
(2, 'B'),
(2, 'D'),
(3, 'C'),
(3, 'C'),
(3, 'C'),
(3, 'E'),
(3, 'F')) AS t (id_element, week_nb)
)
select * from test
This yields:
id_element week_nb
1 A
1 A
2 B
2 B
2 D
3 C
3 C
3 C
3 E
3 F
Then, doing something like:
select
id_element,
week_nb,
sum(first_row_in_sequence) over (partition by id_element) as distinct_week_nb_count
from
(
select
id_element,
week_nb,
case when row_number() over (partition by id_element, week_nb) = 1 then 1 else 0 end as first_row_in_sequence
from test
) as sub
yields
id_element week_nb distinct_week_nb_count
1 A 1
1 A 1
2 B 2
2 B 2
2 D 2
3 C 3
3 C 3
3 C 3
3 E 3
3 F 3
Upvotes: 2
Reputation: 11267
Make partitioned set smaller, up to the point there is no duplicates over counted field :
SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
congestion.id_element,
ROW_NUMBER() OVER(
PARTITION BY congestion.id_element
ORDER BY congestion.date),
COUNT(congestion.week_nb) -- remove distinct
OVER(
PARTITION BY congestion.id_element,
-- add new fields which will restart counter in case duplication
congestion.id_congestion
) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date
Upvotes: 1
Reputation: 4453
No, as the error message states, DISTINCT
is not implemented with windows functions. Aplying info from this link into your case you could use something like:
WITH uniques AS (
SELECT congestion.id_element, COUNT(DISTINCT congestion.week_nb) AS unique_references
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
GROUP BY congestion.id_element
)
SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
congestion.id_element,
ROW_NUMBER() OVER(
PARTITION BY congestion.id_element
ORDER BY congestion.date),
uniques.unique_references AS week_count
FROM congestion
JOIN uniques USING (id_element)
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date
Depending on the situation you could also put a subquery straight into SELECT
-list:
SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
congestion.id_element,
ROW_NUMBER() OVER(
PARTITION BY congestion.id_element
ORDER BY congestion.date),
(SELECT COUNT(DISTINCT dist_con.week_nb)
FROM congestion AS dist_con
WHERE dist_con.date >= '2014.01.01'
AND dist_con.date <= '2014.12.31'
AND dist_con.id_element = congestion.id_element) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date
Upvotes: 10