Electrionics
Electrionics

Reputation: 6782

Relative quality of sorted array

I have 2 sorting alghoritms that provides different results (i sort info by relevancy). As result in both ways I get same items in different order. I know, that first alghorytm provides better results than second. I want to get relative value (from 0 to 1) that means "first N values of array2 is 0.73 quality of first N values of array1" (I compare first elements, because user see it without any actions). First that comes to mind is use sum of differences between position in array1 and array2. For example:

array1: 1 2 3 4 | 5 6 7 8 9

array2: 8 6 2 3 | 7 4 1 5 9 - positions in array1

array2*: 5 5 2 3 | (greater than 4 replaces with 5 to take relative value in diapasone 0..1)

I want to compare first 4 elements:

S = 1 + 2 + 3 + 4 - sum of etalon, maximum deviation

D = |1 - 5| + |2 - 5| + |3 - 2| + |4 - 3| = 9 - this is absolute deviation

To calculate relative quality I use next formula: (S - D)/S = 0.1.

Is there any standart algorithms? What disadvantages of this algoritm?

Upvotes: 1

Views: 141

Answers (1)

amit
amit

Reputation: 178481

What you are looking for is probably DCG [Discounted Cumulative Gain] and nDCG [normalized DCG], which are used to rank relevance.

This assumes one list [let it be list2] is a baseline - the "absolute truth", and list1 should be as closest as possible to it.
The idea is that if the first element if out of order - it is more important if the 10th element is out of order.

The solution is described with more details and an example in my answer in this post [sorry for self-adving myself, it just seems to fit well in here]. and the basic idea is to evaluate:

DCG(list1)/DCG(list2)

Where the relevance of the each element is derived from list2 itself, for example: rel_i = 1/log(1+i)

Notes:

  • Of course DCG can be calculated only on the relvant n elements and not on the entire list.
  • This solution will yield result of 1 if list1 == list2
  • This solution assumes what matters is only where elements appear, and not the numerical value - of the elements. It completely disregard the numerical value.

Upvotes: 1

Related Questions