Reputation: 8060
I am currently building a scoring table for my items.
Each item has a score so the database (postgres) can sort the item by scores and return them to the user.
Currently, the total score of the product is determined by the following formula:
Total = 0.5 * freshness + 0.25 * popularity + 0.25 * relevance
Process A, B, C will run over a couple of hours and produce (item_id, score, type) where type can be either "fresh", "popular", or "relevant".
Note that I have to persist these values as they are produced by different processes.
What do I need to do such that I can execute SELECT * FROM items JOIN scores ON items.id == scores.item_id ORDER BY <total_score ??> DESC LIMIT 10 OFFSET 0;
EDIT
An obvious answer is to have another process generate type = total
for all the items. This works, but it's a pain in the ass because every change in either one of those scores will require total to be updated. In addition, it pretty much can increase my data storage from 25% to 100%. I don't consider this an optimal solution because there is quite a bit of up keeping required to incorporate this.
UPDATE
This is my scores table:
Column | Type | Modifiers | Storage | Description
---------------+-----------------------------+-----------------------------------------------------------+----------+-------------
created_at | timestamp without time zone | | plain |
updated_at | timestamp without time zone | | plain |
id | integer | not null default | plain |
score | double precision | not null | plain |
type | character varying | not null | extended |
Upvotes: 0
Views: 154
Reputation: 346
Here's another cool way to do this using a virtual column, as described here:
First, create a view to aggregate the scores for each item:
CREATE OR REPLACE VIEW vw_scores_rollup AS
SELECT id,
SUM(CASE WHEN type = 'freshness' THEN score ELSE 0 END) AS freshness,
SUM(CASE WHEN type = 'popularity' THEN score ELSE 0 END) AS popularity,
SUM(CASE WHEN type = 'relevance' THEN score ELSE 0 END) AS relevance
FROM scores
GROUP BY id;
Next, this function takes the source table/view as the argument.
CREATE OR REPLACE FUNCTION total(vw_scores_rollup) RETURNS numeric AS
$BODY$
SELECT 0.5 * COALESCE($1.freshness, 0) + 0.25 * COALESCE($1.popularity, 0) + 0.25 * COALESCE($1.relevance, 0);
$BODY$
LANGUAGE sql;
To access:
SELECT *, s.total
FROM items i
JOIN vw_scores_rollup s USING (id)
ORDER BY s.total DESC
LIMIT 10 OFFSET 0;
This is a neat trick and provides a straightforward way to access the total.
Upvotes: 2
Reputation: 125204
No need for multiple joins. Just aggregate before joining.
select i.*, s.total
from
items i
inner join
(
select
id,
coalesce(sum((type = 'fresh')::integer * score * 0.5), 0)
+ coalesce(sum((type = 'popularity')::integer * score * 0.25), 0)
+ coalesce(sum((type = 'relevance')::integer * score * 0.25), 0)
total
from scores
group by id
) s on i.id = s.id
order by s.total desc
limit 10
Upvotes: 0
Reputation: 52107
Here you go...
SELECT item_id, SUM(S) TOTAL
FROM (
SELECT item_id, 0.5 * score S
FROM scores
WHERE type = 'freshness'
UNION ALL
SELECT item_id, 0.25 * score
FROM scores
WHERE type IN ('popularity', 'relevance')
) Q1
GROUP BY item_id
ORDER BY TOTAL DESC;
This will give you item IDs and the associated total scores (sorted by from highest to lowest).
You can easily JOIN that with the items
table if necessary, LIMIT to top 10 etc...
Another possibility...
SELECT
item_id,
SUM (
CASE type
WHEN 'freshness' THEN 0.5
WHEN 'popularity' THEN 0.25
WHEN 'relevance' THEN 0.25
END
* score
) TOTAL
FROM scores
GROUP BY item_id
ORDER BY TOTAL DESC;
Upvotes: 0
Reputation: 424983
Order by the expression for the total score, joining to each score row separately so that all score types can be used in the calculation.
SELECT * FROM items
LEFT JOIN scores f ON items.id = f.item_id and type = 'freshness'
LEFT JOIN scores p ON items.id = p.item_id and type = 'popularity'
LEFT JOIN scores r ON items.id = r.item_id and type = 'relevance'
ORDER BY
0.5 * COALESCE(f.score, 0) +
0.25 * COALESCE((p.score, 0) +
0.25 * COALESCE(r.score) DESC
LIMIT 10 OFFSET 0
There is no need to store the total.
Note the use of LEFT JOIN
, which means that items without particular score types will still be returned. I used COALESCE()
to give a score of zero for any missing score types.
You may think this will cause a performance problem, but I doubt it. Try it and see how it performs before you consider storing the total, which would be for performance reasons only and therefore a case of "optimizing early" - an anti-pattern to be avoided.
Upvotes: 2