Reputation: 7209
I have a large table (of sites) with several numeric columns - say a through f. (These are site rankings from different organizations, like alexa, google, quantcast, etc. Each has a different range and format; they're straight dumps from the outside DBs.)
For many of the records, one or more of these columns is null, because the outside DB doesn't have data for it. They all cover different subsets of my DB.
I want column t to be their weighted average (each of a..f have static weights which I assign), ignoring null values (which can occur in any of them), except being null if they're all null.
I would prefer to do this with a simple SQL calculation, rather than doing it in app code or using some huge ugly nested if block to handle every permutation of nulls. (Given that I have an increasing number of columns to average over as I add in more outside DB sources, this would be exponentially more ugly and bug-prone.)
I'd use AVG but that's only for group by, and this is w/in one record. The data is semantically nullable, and I don't want to average in some "average" value in place of the nulls; I want to only be counting the columns for which data is there.
Is there a good way to do this?
Ideally, what I want is something like UPDATE sites SET t = AVG(a*@a_weight,b*@b_weight,...)
where any null values are just ignored and no grouping is happening.
EDIT: What I ended up using, based on van's and adding in correct weighted averages (assuming that a
has already been normalized as needed, in this case to a float 0-1 (1 = better):
UPDATE sites
SET t = (@a_weight * IFNULL(a, 0) + ...) / (IF(a IS NULL, 0, @a_weight) + ...)
WHERE (IF(a IS NULL, 0, 1) + ...) > 0
Upvotes: 5
Views: 2650
Reputation: 76962
UPDATE sites
--// TODO: you might need to round it depending on your type
SET t =(COALESCE(a, 0) +
COALESCE(b, 0) +
COALESCE(c, 0) +
COALESCE(d, 0) +
COALESCE(e, 0) +
COALESCE(f, 0)
) /
((CASE WHEN a IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN e IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN f IS NULL THEN 0 ELSE 1 END CASE)
)
WHERE 0<>((CASE WHEN a IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN e IS NULL THEN 0 ELSE 1 END CASE) +
(CASE WHEN f IS NULL THEN 0 ELSE 1 END CASE)
)
You could use COALESCE
also in the other parts, but this will not handle the case when you have a rating with value 0
properly because it will be excluded. The WHERE
clause avoids DivideByZero
, but you might need to have additional UPDATE
statement to handle this case, if there is no rating for the entry.
Upvotes: 3