somejkuser
somejkuser

Reputation: 9040

Rating algorithm seems off

I'm writing a rating algorithm for locations in my application. The algorithm does the following:

takes the average rating of a club and mulitplies by a club multiplier

a club multiplier is a decimal mulitplied against the average rating to account for a clubs impression against the total

Currently my algorithm is the following:

CLUB RATING = SUM(RATINGS FOR CLUB) / COUNT(RATINGS FOR CLUB)

CLUB MULTIPLIER = CLUB TOTAL NUMBER OF RATINGS / TOTAL NUMBER OF RATINGS FOR ALL CLUBS

WEIGHTED VALUE = CLUB RATING * CLUB MULTIPLIER

I came up with this algorithm myself.

I imagined that figuring out the influence of the club by number of ratings against all clubs number of ratings is the multiplier and we multiply this against the standard average to determine the weighted average of this club against all clubs.

Here is my resulting dataset:

Array
(
    [0] => Array
        (
            [locid] => 332
            [totalclubsnumratings] => 12321
            [clubaveragerating] => 4.4
            [clubnumratings] => 1121
            [clubmultiplier] => 9.0982874766659
            [weightedvalue] => 40.00
        )

    [1] => Array
        (
            [locid] => 329
            [totalclubsnumratings] => 12321
            [clubaveragerating] => 3.1
            [clubnumratings] => 909
            [clubmultiplier] => 7.3776479181885
            [weightedvalue] => 23.00
        )

    [2] => Array
        (
            [locid] => 1681
            [totalclubsnumratings] => 12321
            [clubaveragerating] => 4.7
            [clubnumratings] => 517
            [clubmultiplier] => 4.1960879798718
            [weightedvalue] => 20.00
        )

    [3] => Array
        (
            [locid] => 1710
            [totalclubsnumratings] => 12321
            [clubaveragerating] => 4.1
            [clubnumratings] => 505
            [clubmultiplier] => 4.0986932878825
            [weightedvalue] => 17.00
        )

    [4] => Array
        (
            [locid] => 3312
            [totalclubsnumratings] => 12321
            [clubaveragerating] => 4.2
            [clubnumratings] => 398
            [clubmultiplier] => 3.2302572843113
            [weightedvalue] => 14.00
        )

)

The problem is I can't tell if its calculating correctly or not. Club with locid 329 (the second club) has a higher amount of ratings but its average rating is much smaller versus the third club, with locid 1681 who has a smaller number of ratings but higher average of club rating.

Should I expect the ordering to have some clubs with a higher weighted value but a lower club average rating or am i missing a second algorithm that redetermines the club rating?

I'd like someone to look at this and tell me what this algorithm is doing incorrectly.

Upvotes: 2

Views: 80

Answers (1)

btilly
btilly

Reputation: 46445

If you want a hack to come up with reasonable uncertainties, first calculate the variance of the average vote, across all votes for all clubs.

Then for each club, call the standard deviation of its rating is sqrt(variance * votes). (This is factually wrong. But will work well enough.) And you have a median and a 95% confidence interval of 2 standard deviations to each side.

Now you can choose to be pessimistic about clubs, and give them each a rating of, say, 1 standard deviation below their median. If you do this, then a club with 2 5.0 ratings will likely wind up worse than a club with a 4.5 rating after 100 votes. To get to a truly top ranking you have to both do well, and have a lot of votes.

Upvotes: 2

Related Questions