asachet
asachet

Reputation: 6921

widely used formula for DCG seems to be wrong?

There are two widely used formulas (found in most Information retrieval lecture slides on the internet, e.g. at Stanford) for computing the Discounted Cumulative Gain. One of them is, for the DCG at rank p:

first formula

This is, in fact:

enter image description here

Because log_2(2) = 1. This means that the so-called "discounted" CG is actually not discounted before the third rank!

The following rankings are therefore not distinguishable by the DCG using this formula: (10,5,1,2,...) and (5,10,1,2,...).

I am guessing that the formula is incorrect and should be:

enter image description here

Note btw that the other very common formula (see wikipedia) has this denominator.

I would not be asking if I hadn't seen this formula in practically all the lectures I found on the internet and even in my own lectures at UCL. Is it not wrong? It would be incredible that an error has propagated from Wikipedia and not been picked up by the professors... Am I wrong then?

Upvotes: 2

Views: 285

Answers (1)

asachet
asachet

Reputation: 6921

I found this paper from Microsoft (see equation 6) which backs up my claim that it is basically a typo start discounting at rank 3 only. When you think of it, it makes no sense at all to not discount the rank 2! The metric would be unable to distinguish the rankings (10, 5, 2) and (5, 10, 2), when the first ranking is better. Note that all the other DCG formulas do discount rank 2 and thus would pick up a difference.

So a "+1" is indeed missing in the log, and it is a typo which has been creeping in a lot of papers and lectures...

Upvotes: 2

Related Questions