GigAHerZ
GigAHerZ

Reputation: 53

Sorting List randomly by weights

The idea is that i have a list of items with each item having a weight attached. Now i want to randomize the order of the list, but i also want to take the weight into account to "bias" the randomization process.

There are multiple approaches for that, but i'm particulary interested in one certain approach and why it is producing different distribution than i expect. The code is probably fine, but i want to get understanding, why it does what it does?

I know other algorithms that produce expected result.

One is to basically create a range, each with the length of particular item's weight and then randomly pick a point from the full range that got produced. This creates item one-by-one, by doing that over and over again until there are no items/no range to pick from. It produces the expected ratios over a million tries.

There's also another algorithm, that doesn't need range to be produced, but expects initial list to be in random order and then through substraction and checking against x <= 0, also takes items one-by-one to produce a list of randomly, but biased order of items. It produces the expected ratios over a million tries.

The approach i want to currently concentrate on, is to produce a ordering value for each item, and then order the whole list in one go. The code i have written does not produce expected ratios over a million tries.

C# code for a console application

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleTest1
{
    class Program
    {
        static void Main(string[] args)
        {
            var myList = new List<Item>
            {
                new Item { Name = "A70", Weight = 70},
                new Item { Name = "B20", Weight = 20},
                new Item { Name = "C10", Weight = 10},
            };

            var stats = new Dictionary<string, int>();
            myList.ForEach(x => stats.Add(x.Name, 0));

            var rnd = new Random();
            for (var i = 0; i < 1000000; ++i)
            {
                var newList = GetSorted(myList, rnd);
                ++stats[newList.First().Name];
            }

            var sum = stats.ToList().Sum(x => x.Value);
            stats.ToList().ForEach(x => Console.WriteLine($"{x.Key}: {((float)x.Value / sum * 100):0.00}%"));

            Console.ReadLine();
        }

        private static IEnumerable<Item> GetSorted(IEnumerable<Item> list, Random rnd)
        {
            return list
                .Select(x => new
                {
                    Order = x.Weight * rnd.NextDouble(),
                    Item = x
                })
                .OrderByDescending(x => x.Order)
                .Select(x => x.Item);
        }
    }

    class Item
    {
        public string Name { get; set; }
        public int Weight { get; set; }
    }
}

By this code, i would expect probability of each item to be in the first position of the list to be very similar to the weights of each item. Instead of 70-20-10% ratio, i get roughly 85-13-2% ratios. It almost looks like there's some kind of non-linearity coming into play, but i just don't get it right now.

The question here is to understand, how this given code works. I know and have different approaches that work and produces expected result.

Thank you!

Upvotes: 2

Views: 1023

Answers (2)

Theodor Zoulias
Theodor Zoulias

Reputation: 43554

Here is an explanation. For simplicity lets consider a simpler case:

var myList = new List<Item>
{
    new Item { Name = "A20", Weight = 20},
    new Item { Name = "B10", Weight = 10},
};

We are determining the sort order by multiplying Weight with a random number. If we multiply the weight of A20 with any number above 0.5, will be sorted first whatever the random number for B10 is. If we multiply the weight of A20 with any number bellow 0.5, then it has an equal chance with B10 to by the first. So the distribution will be 75%/25%, and not the initially intuitive 67%/33%.

To fix the algorithm you must use the square root of the Weight.

.Select(x => new
{
    Order = Math.Sqrt(x.Weight) * rnd.NextDouble(),
    Item = x
})

Update: Squaring the weights is not a good fix.

Upvotes: 2

jdweng
jdweng

Reputation: 34419

I fixed code to work :

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication107
{
    class Program
    {
        static void Main(string[] args)
        {
             var myList = new List<Item>
            {
                new Item { Name = "A70", Weight = 70},
                new Item { Name = "B20", Weight = 90},
                new Item { Name = "C10", Weight = 100},
            };

            var stats = new Dictionary<string, int>();
            myList.ForEach(x => stats.Add(x.Name, 0));

            var rnd = new Random();
            for (var i = 0; i < 1000000; ++i)
            {
                var newList = GetSorted(myList, rnd);
                ++stats[newList.Name];
            }

            var sum = stats.ToList().Sum(x => x.Value);
            stats.ToList().ForEach(x => Console.WriteLine("{0}: '{1}", x.Key, ((float)x.Value / sum * 100)));

            Console.ReadLine();

        }
        private static Item GetSorted(List<Item> list, Random rnd)
        {
            Item results = null;
            int value = rnd.Next(0,100);
            for(int i = 0; i < list.Count(); i++)
            {
                if (value < list[i].Weight)
                {
                    results = list[i];
                    break;
                }
            }
            return results;
        }
    }

    class Item
    {
        public string Name { get; set; }
        public int Weight { get; set; }
    }
}

Upvotes: 0

Related Questions