Reputation: 32650
I sometimes shuffle lists or arrays using OrderBy (item => R.NextDouble())
, where Random R
is initialized elsewhere.
Now this clearly is a hack, though the one recommended everywhere on this site, and it does work very nicely in practice.
However, it was also pointed out that this makes the assumption that the sorting algorithm does not get confused by changing values of a single item and e.g. enters an infinite loop.
My question is if there is some sort of implicit or explicit guarantee that this won't happen. I could not find anything about it in MSDN.
While there are very good, specialized O(n) algorithms for this purpose, I don't always want to search and copy-paste them in little side projects (we are not talking about prod here). They would obviously be the right solution.
Note that performance is not an issue at all. Also, there is no need for good or secure randomness (a cryptographic library must be used in that case) - this is just about shuffling stuff up a bit.
Upvotes: 3
Views: 212
Reputation: 113272
It's not guaranteed to continue working as well as it does.
It's highly unlikely that it would ever stop working as well as it does, because that change would require a change to the pre-computation of key values, which would be very expensive in many cases. As such, a change that broke it is unlikely to ever happen. It would also be an observable change in other ways, in terms of how often the selector is called. That sort of observable change isn't ruled out (I've had PRs to .NET Core that reduced how often selectors are called in other linq methods accepted) but the change would have to have a very strong benefit, especially since it would increase, rather than decrease, the number of calls.
It doesn't currently work very well, because Enumerable.OrderBy
is (unlike other OrderBy
methods in linq, such as that on Queryable
or in PLinq) guaranteed to give a stable ordering (equal values always in their original order) which gives you a bias.
It is useful for quick-to-write random orderings where the quality of the shuffle need only be pretty good. For a more robust shuffle use something like:
public static class Shuffler
{
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, int seed)
{
return new ShuffledEnumerable<T>(source, seed);
}
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
{
return new ShuffledEnumerable<T>(source);
}
private class ShuffledEnumerable<T> : IEnumerable<T>
{
private IEnumerable<T> _source;
private int? _seed;
public ShuffledEnumerable(IEnumerable<T> source)
{
_source = source;
}
public ShuffledEnumerable(IEnumerable<T> source, int seed)
: this(source)
{
_seed = seed;
}
public IEnumerator<T> GetEnumerator()
{
Random rnd = _seed.HasValue ? new Random(_seed.GetValueOrDefault()) : new Random();
T[] array = _source.ToArray();
int count = array.Length;
for (int i = array.Length - 1; i > 0; --i)
{
int j = rnd.Next(0, i + 1);
if (i != j)
{
T swapped = array[i];
array[i] = array[j];
array[j] = swapped;
}
}
return ((IEnumerable<T>)array).GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
You can improve it further again if you need cryptographic quality of randomness by replacing the use of Random
to a suitable cryptographic PRNG.
Upvotes: 2
Reputation: 205619
It's not guaranteed, although it works and most likely will not change. But if you wish to be 100% sure, you can guarantee that yourself by using explicit projection
enumerable.Select(item => new { Source = item, Order = R.NextDouble() }.OrderBy(item => item.Order).Select(item.Source)
which of course is much more verbose, but still easy for a quick and dirty approach.
Upvotes: 2
Reputation: 171178
This is not guaranteed to work according to the documentation.
I have peeked into the internals. The sorting algorithm first captures all sort keys into a temporary array and then never calls them again. So in the current implementation this is safe. A new .NET version or even a patch can invalidate this technique.
You might argue that for compatibility reasons this will never change and the probably of that being true is like 95% (source: me). Microsoft has very high compat standards.
Another problem is that if you generate the same value for two keys they will not be positioned randomly. Instead, they will be positioned based on internals of the sorting algorithm which might be deterministic. For example, they might never switch their relative order.
I would only rely on this in short-term throw-away projects.
Another hack in this spirit is to sort by Guid.NewGuid()
. That saves you another line of code and has no seeding problems. On the other hand guids might have a pattern or even be sequential. Also very problematic.
I would fail any of this in a code review of production code.
Upvotes: 4