Reputation: 4473
I have a dataset like this where some rows are useful, but corrupted.
create table pages (
page varchar,
cat varchar,
hits int
);
insert into pages values
(1, 'asdf', 1),
(1, 'fdsa', 2),
(1, 'Apples', 321),
(2, 'gwegr', 30),
(2, 'hsgsdf', 2),
(2, 'Bananas', 321);
I want to know the correct category for each page, and the total hits. The correct category is the one with the most hits. I'd like to have a dataset like:
page | category | sum_of_hits
-----------------------------
1 | Apples | 324
2 | Bananas | 353
The furthest I can get is:
SELECT page,
last_value(cat) over (partition BY page ORDER BY hits) as category,
sum(hits) as sum_of_hits
FROM pages
GROUP BY 1, 2
But it is erroring: ERROR: column "pages.hits" must appear in the GROUP BY clause or be used in an aggregate function Position: 83
.
I tried putting the hits in an aggregate - ORDER BY max(hits)
but that doesn't make sense and isn't what I want.
Fiddle: http://sqlfiddle.com/#!17/cb3c2/17
Upvotes: 3
Views: 355
Reputation: 1270713
You seem to want to take the sum of hits and the maximum. This is easy enough:
select page, sum(hits) as total_hits,
max(case when seqnum = 1 then category end) as category
from (select p.*,
row_number() over (partition by page order by hits desc) as seqnum
from pages p
) p
group by page;
Upvotes: 0
Reputation: 3463
Use a subquery:
select page, cat, hits from
(select page, cat, hits
,max(hits) over (partition by page) as m_hits
from pages) t
where m_hits = hits
Upvotes: 0
Reputation: 10411
There are two problems here:
First is the usage of the last_value. Read a note in the Postgres documentation about the last value:
Note that first_value, last_value, and nth_value consider only the rows within the "window frame", which by default contains the rows from the start of the partition through the last peer of the current row. This is likely to give unhelpful results for nth_value and particularly last_value. You can redefine the frame as being the whole partition by adding ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to the OVER clause. See Section 4.2.8 for more information.
I recommend you convert it to first_value:
SELECT page,
first_value(cat) over (partition BY page ORDER BY hits DESC) as category,
hits
FROM pages
The second problem is that you can't use the window function directly in the GROUP BY clause, you need to use a subquery or a cte:
select page, category,
sum(hits)
from (
SELECT page,
first_value(cat) over (partition BY page ORDER BY hits DESC) as category,
hits
FROM pages
) a
GROUP BY 1, 2
Upvotes: 2
Reputation: 121804
Use the window function first_value()
over the reversed order of hits
in a derived table (a subquery in FROM clause):
select
page,
category,
sum(hits) as sum_of_hits
from (
select
page,
first_value(cat) over (partition by page order by hits desc) as category,
hits
from pages
) s
group by 1, 2
order by 1;
Upvotes: 1