Reputation: 723
I need a high performance random number generator that is thread-safe. I need only random bytes in the value type (which is ulong
for now), not within ranges. I've used the C# built-in Random
class, but it was kind of slow and not thread-safe.
Later I moved to XORShift functions that actually works very fine, but to achieve thread-safeness I need to put the calculation in lock
, and that degrades the performance drastically.
What I'm using to generate a random ulong
is the following:
public class Rand
{
ulong seed = 0;
object lockObj = new object();
public Rand()
{
unchecked
{
seed = (ulong)DateTime.Now.Ticks;
}
}
public Rand(ulong seed)
{
this.seed = seed;
}
public ulong GetULong()
{
unchecked
{
lock (lockObj)
{
ulong t = 0;
t = seed;
t ^= t >> 12;
t ^= t << 25;
t ^= t >> 27;
seed = t;
return t * 0x2545F4914F6CDD1D;
}
}
}
}
This works fine and fast, but locking makes it take about 1-2us
if it is called from 200 concurrent threads, otherwise calculation finishes under 100ns
.
If I remove locking there is a chance two threads take the same seed and will calculate the same random which is not good for my purposes. If I'm removing the ulong t
declaration and work directly on the seed then there will be a very little chance to generate the same random for two concurrent calls, but there is also a chance the value will be shifted out from the value range, like t << 25
will be called many times in a row by different threads without carrying the rotation it will become just simply 0.
I think the proper way would be if there is a shared value that may be changed by any concurrent call and work with that value in the calculation methods, since these values are atomic (at least withing CPU cores) it is not a problem if many calculations are using it in the same time, but that is a problem if this value shifts out from the bitrange.
Is there any good solution to solve this problem? I'd be thankful for any help.
Edit: Ok, I've forgot to mention I have no control over the threads, because async tasks are calling this function, so threads are coming randomly from the threadpool, using thread ID is also a no solution, since there is a chance one specific thread never will call this method again at all, and keeping an instance for that ID is not a good thing.
Upvotes: 2
Views: 2229
Reputation: 470
The less code we execute in the critical section, the faster it will work. It runs 30-50% faster on my CPU. You can also use an asynchronous process that will prepare the next collection
public sealed class Rand
{
private ulong seed = 0;
private readonly object lockObj = new object();
public Rand()
{
unchecked
{
seed = (ulong) DateTime.Now.Ticks;
}
_current = 500;
}
public Rand(ulong seed)
{
this.seed = seed;
}
private ulong[] _batch = new ulong[501];
private int _current = -1;
public ulong GetULong2()
{
unchecked
{
ulong t = 0;
lock (lockObj)
{
t ^= seed >> 12;
t ^= t << 25;
t ^= t >> 27;
seed = t;
}
return t * 0x2545F4914F6CDD1D;
}
}
public ulong GetULong5()
{
unchecked
{
var t = seed;
t *= (uint)Thread.CurrentThread.ManagedThreadId;
t ^= t >> 12;
t ^= t << 25;
t ^= t >> 27;
seed = t;
return t * 0x2545F4914F6CDD1D;
}
}
public ulong GetULong()
{
unchecked
{
do
{
var current = Interlocked.Increment(ref _current);
if (current < 501)
return _batch[current];
lock (lockObj)
{
if (_current >= 500)
{
ulong t = seed;
for (int i = 0; i < 501; i++)
{
t ^= t >> 12;
t ^= t << 25;
t ^= t >> 27;
var result = t * 0x2545F4914F6CDD1D;
_batch[i] = result;
}
seed = t;
_current = -1;
}
}
}while(true);
}
}
}
Upvotes: 3
Reputation: 39277
You can do it without locking and still be threadsafe. Assuming the calculation is very fast (it is) and there is slower code executing around it, it's likely faster to simply recalculate if another thread changes it in between starting the calculation and finishing it. You can do that with an Interlocked.CompareExchange
spin loop. The only difficulty is that there is no ulong version of that so we have to use an unsafe method to get the equivalent.
private static unsafe ulong InterlockedCompareExchange(ref ulong location,
ulong value, ulong comparand)
{
fixed (ulong* ptr = &location)
{
return (ulong)Interlocked.CompareExchange(ref *(long*)ptr, (long)value, (long)comparand);
}
}
public ulong GetULong()
{
unchecked
{
ulong prev = seed;
ulong t = prev;
t ^= t >> 12;
t ^= t << 25;
t ^= t >> 27;
while (InterlockedCompareExchange(ref seed, t, prev) != prev)
{
prev = seed;
t = prev;
t ^= t >> 12;
t ^= t << 25;
t ^= t >> 27;
}
return t * 0x2545F4914F6CDD1D;
}
}
Upvotes: 3
Reputation: 723
Ok this solution by l33t is a very nice and elegant way to fix the cross-thread issue, and this solution by Stanislav also a suitable way to prevent on-demand generation by pre-generating and caching batches. Thanks everyone for the ideas.
Meanwhile I changed the bitshift to rotate, this will avoid shifting out the seed to 0, and I could omit the locking. Apparently it works just fine, and with very small latency (200 concurrent threads with about 100-120ns/call)
public class Rand
{
ulong seed = 0;
object lockObj = new object();
public Rand()
{
unchecked
{
seed = (ulong)DateTime.Now.Ticks;
}
}
public Rand(ulong seed)
{
this.seed = seed;
}
public ulong GetULong()
{
unchecked
{
seed ^= (seed >> 12) | (seed << (64 - 12));
seed ^= (seed << 25) | (seed >> (64 - 25));
seed ^= (seed >> 27) | (seed << (64 - 27));
seed *= 0x2545F4914F6CDD1D;
int s = Environment.CurrentManagedThreadId % 64;
return (seed >> s) | (seed << (64 - s));
}
}
// even faster
public ulong GetULong2()
{
unchecked
{
seed ^= (seed >> 12) | (seed << (64 - 12));
seed ^= (seed << 25) | (seed >> (64 - 25));
seed ^= (seed >> 27) | (seed << (64 - 27));
ulong r = seed * 0x2545F4914F6CDD1D;
seed = r;
int s = Environment.CurrentManagedThreadId % 64;
return (seed >> s) | (seed << (64 - s));
}
}
// better entropy
public ulong GetULong3()
{
unchecked
{
int s = Environment.CurrentManagedThreadId % 12;
seed ^= (seed >> (12 - s)) | (seed << (64 - (12 - s)));
seed ^= (seed << (25 - s)) | (seed >> (64 - (25 - s)));
seed ^= (seed >> (27 - s)) | (seed << (64 - (27 - s)));
ulong r = seed * 0x2545F4914F6CDD1D;
seed = r;
s = Environment.CurrentManagedThreadId % 64;
return (r >> s) | (r << (64 - s));
}
}
}
While this solution fits my demands the most, I will not mark it as an answer because it does not produce the same randoms as the one in the question, but still produces seemingly unique randoms across all threads, so still might be a solution.
Edit: Ok, after some experimenting GetULong()
was the fastest, but from 100 million random generations produced more than 23000 colliding values on 200 concurrent threads. Same with GetULong2()
. Added a little extra for GetULong3()
method to increase entropy, that produced only about 210 colliding values from 100 million generations on 200 concurrent threads, and only about 50 colliding values from 100 million generations on 500 concurrent threads.
For me this kind of entropy is more than enougn, because they must be unique, so in my application after each generation they are tried to be added to a collection atomically and if there is already one with the same key, then random is called again. 50-200 retry in 100 million events is acceptable even on general random generators, so this is more than enough for me, especially, because it is fast, and can be used tread-safe with only one instance of the random generator.
Thank you everyone for the help, I hope this may help others also.
Upvotes: -2
Reputation: 19966
Simply create one instance of Rand
on each thread. Thread-safe, no locking, thus very performant. This can be achieved using the ThreadStaticAttribute.
public static class Rand
{
[ThreadStatic] private static Rand defaultRand;
public static Rand Default => defaultRand ??= new Rand();
// Add extra methods for seeding the static instance...
}
// Then in any thread:
var randomNumber = Rand.Default.GetULong();
Upvotes: 6