Reputation: 113
I need to create a list with one billion integers and they must all be unique. I also need this to be done extremely fast.
Creating a list and adding random numbers one by one and checking to see if each one is a duplicate is extremely slow.
It seems to be quite fast if I just populate a list with random numbers without checking if they are duplicates and then using distinct().toList(). I repeat this until there are no more duplicates. However the extra memory used by creating a new list is not optimal. Is there a way to get the performance of distinct() but instead of creating a new list it just modifies the source list?
Upvotes: 10
Views: 29800
Reputation: 11
I couldn't find anything good enough on the internet so I ended up writing this method, which eliminates unnecessary generation of random numbers in case of duplicates and also does not keep a large list of numbers in memory in case you're working with large min-max ranges.
// Generates a list of random numbers between min (inclusive) and max (exclusive), without any duplicates
List<int> GenerateRandomNumbers(int min, int max, int count)
{
if (max <= min || max - min < count)
{
throw new ArgumentException($"Invalid arguments: min {min}, max {max} and count {count}.");
}
List<int> numbers = new();
Dictionary<int, int> dict = new();
for (int i = 0; i < count; i++)
{
int r = RandomNumberGenerator.GetInt32(max - min - i);
// int r = Random.Shared.Next(max - min - i);
if (!dict.TryGetValue(r, out int n))
{
n = r;
}
numbers.Add(n + min);
if (!dict.TryGetValue(max - min - i - 1, out int last))
{
last = max - min - i - 1;
}
dict[r] = last;
}
return numbers;
}
Upvotes: 0
Reputation: 26309
You can trick LINQ to jumble the numbers for you by supplying a random number lambda to OrderBy:
Random rand = new Random();
var ints = Enumerable.Range(0, 1000000000).OrderBy(i => rand.Next());
Upvotes: 0
Reputation: 1667
I found this the fastest while maintaining randomness:
Random rand = new Random();
var ints = Enumerable.Range(0, numOfInts)
.Select(i => new Tuple<int, int>(rand.Next(numOfInts), i))
.OrderBy(i => i.Item1)
.Select(i => i.Item2);
...basically assigning a random id to each int and then sorting by that id and selecting the resulting list of ints.
Upvotes: 16
Reputation: 108880
If the amount of possible integers from which you draw is significantly larger (say factor 2) than the amount of integers you want you can simply use a HashSet<T>
to check for duplicates.
List<int> GetUniqueRandoms(Random random, int count)
{
List<int> result = new List<int>(count);
HashSet<int> set = new HashSet<int>(count);
for(int i = 0; i < count; i++)
{
int num;
do
{
num = random.NextInt();
while(!set.Add(num));
result.Add(num);
}
return result;
}
This allocates the collections with the correct capacity to avoid reallocation during growth. Since your collections are large this should be a large improvement.
You can also use Distinct
a single time:
IEnumerable<int> RandomSequence(Random random)
{
while(true)
{
yield return random.NextInt();
}
}
RandomSequence(rand).Distinct().Take(1000000000).ToList();
But with both solutions you need enough memory for a HashSet<int>
and a List<int>
.
If the amount of possible integers from which you draw is about as large as the amount of integers you want, you can create an array that contains all of them, shuffle them and finally cut off those you're not interested in.
You can use Jon Skeet's shuffle implementation.
Upvotes: 4
Reputation: 1821
Taking the question literally (a list with one billion integers and they must all be unique):
Enumerable<int>.Range(0, 1000000000)
But along the lines of CodeCaster's answer, you can create the list and shuffle it at the same time:
var count = 1000000000;
var list = new List<int>(count);
var random = new Random();
list.Add(0);
for (var i = 1; i < count; i++)
{
var swap = random.Next(i - 1);
list.Add(list[swap]);
list[swap] = i;
}
Upvotes: 2
Reputation: 71591
What if you created the list in a sorted but still random fashion (such as adding a random number to the last element of the list as the next element), and then shuffled the list with a Fisher-Yates-Durstenfeld? That would execute in linear time overall, which is pretty much as good as it gets for list generation. However, it might have some significant bias that would affect the distribution.
Upvotes: 0
Reputation: 888167
You can track duplicates in a separate HashSet<int>
:
var set = new HashSet<int>();
var nums = new List<int>();
while(nums.Count < 1000000000) {
int num;
do {
num = rand.NextInt();
} while (!set.Contains(num));
set.Add(num);
list.Add(num);
}
You need a separate List<int>
to store the numbers because a hashset will not preserve your random ordering.
Upvotes: 2
Reputation: 151720
Do the integers need to be in a certain range? If so, you could create an array or list with all numbers in that range (for example from 1 to 1000000000) and shuffle that list.
Upvotes: 14