ZiiMakc
ZiiMakc

Reputation: 36946

Getting sum of votes for each rating and average. Postgres

I have table:

| book_id | reader_id | rating
    1     |     2     |   1
    2     |     3     |   2
    2     |     2     |   5
    2     |     4     |   10

One user can vote only once from 1 to 10.

Question 1: What is the best way to get average book rating and amount of votes for every rating? Like:

 avr                     ratings
"3.6"   "2" "0" "0" "0" "0" "0" "0" "0" "1" "0"

For now i do like this:

SELECT  
    AVG(rating),
    sum(case when rating  = 1 then 1  else 0 end) as r1,
    sum(case when rating  = 2 then 1  else 0 end) as r2,
    sum(case when rating  = 3 then 1  else 0 end) as r3,
    sum(case when rating  = 4 then 1  else 0 end) as r4,
    sum(case when rating  = 5 then 1  else 0 end) as r5,
    sum(case when rating  = 6 then 1  else 0 end) as r6,
    sum(case when rating  = 7 then 1  else 0 end) as r7,
    sum(case when rating  = 8 then 1  else 0 end) as r8,
    sum(case when rating  = 9 then 1  else 0 end) as r9,
    sum(case when rating  = 10 then 1  else 0 end) as r10
FROM books_rates
WHERE book_id=2;

Question 2: why result of this query is object?

{
       avg: '1.00000000000000000000',
       r1: '3',
       r2: '0',
       r3: '0',
       r4: '0',
       r5: '0',
       r6: '0',
       r7: '0',
       r8: '0',
       r9: '0',
       r10: '0'
     }

Question 3: will it be better if i create column in book table where i will store result of query above so that on every book load there will be no need to do this heavy(is it?) query (of course this result column will be updating if someone give a new rate)?

Upvotes: 1

Views: 449

Answers (1)

Mikhail Antonov
Mikhail Antonov

Reputation: 1367

Given that you have clean data in your table, meaning that you already have constraints like unique key on book+reader to forbid the same user vote twice for a book and a constraint on rating field which won't let you insert something other than an [1..10] integer, the following usage of window functions might work:

create table ratings (
  book_id int,
  reader_id int,
  rating int
);

insert into ratings (book_id, reader_id, rating) values 
(1,2,1),
(1,10,1),
(1,101,2),
(2,3,2),
(2,2,5),
(2,4,10);


select book_id, reader_id, rating, count(reader_id) over (partition by book_id, rating) as same_rating_votes, avg(rating) over(partition by book_id) as book_avg_rating from ratings;

select sq.book_id, sq.rating, max(sq.same_rating_votes) as     same_rating_votes, max(sq.book_avg_rating) as book_avg from (
    select book_id, reader_id, rating, count(reader_id) over (partition by book_id, rating) as same_rating_votes, avg(rating) over(partition by book_id) as book_avg_rating from ratings ) as sq
group by sq.book_id, sq.rating;

http://www.sqlfiddle.com/#!17/eb4ea/2

http://www.sqlfiddle.com/#!17/eb4ea/7

In each row you will have a reference to a book, reader, his vote, an average rating for the book and an amount of same votes for this book. Maybe this can be further aggregated with array_agg or something if you absolutely need to have the result in one line.

Speaking of question 3, I believe if you have millions of books and votes it would probably be a reasonable idea to calculate everything in a materialized view and recalculate it at night.

Upvotes: 1

Related Questions