Reputation: 945
I am working on a chess engine for some time now. For improving the engine, I wrote some code which loads chess-positions from memory into some tuner code. I have around 1.85B fens on my machine which adds up to 40Gb (24B per position).
After loading, I end up with a vector of positions:
struct Position{
std::bitset<8*24> bits{};
}
void main(){
std::vector<Position> positions{};
// mimic some data loading
for(int i = 0; i < 1.85e9; i++){
positions.push_back(Position{})
}
// ...
}
The data is organised in the following way:
The positions are taken from games where the positions are seperated by just a few moves. Usually about 40-50 consecutive moves come the same game / line and are therefor somewhat equal.
Eventually I will read 16384 position within a single batch and ideally none of those positions come from the same game. Therefor I do some initial sorting before using the data.
My current shuffling method is this:
auto rng = std::default_random_engine {};
std::shuffle(std::begin(positions), std::end(positions), rng);
Unfortunately this takes quiet some time (about 1-2 minutes). Since I dont require perfect shuffles, I assume that some easier shuffles exist.
My second aproach was:
for(int i = 0; i < positions.size(); i++){
std::swap(positions[i], positions[(i*16384) % positions.size()]);
}
which will ensure that there are not going to be positions coming from the same game within a single batch and are evenly spaces by 16384 entries.
I was wondering if there is some even simpler, faster solution. Especially considering that the modulo-operator requires quiet some clock cycles.
I am happy for any "trivial" solution.
Greetings Finn
Upvotes: 2
Views: 1771
Reputation: 122830
There is a tradeoff to be made: Shuffling a a std::vector<size_t>
of indices can be expected to be cheaper than shuffling a std::vector<Position>
at the cost of an indirection when accessing the Position
s via shuffled indices. Actually the example on cppreference for std::iota
is doing something along that line (it uses iterators):
#include <algorithm> #include <iostream> #include <list> #include <numeric> #include <random> #include <vector> int main() { std::list<int> l(10); std::iota(l.begin(), l.end(), -4); std::vector<std::list<int>::iterator> v(l.size()); std::iota(v.begin(), v.end(), l.begin()); std::shuffle(v.begin(), v.end(), std::mt19937{std::random_device{}()}); std::cout << "Contents of the list: "; for(auto n: l) std::cout << n << ' '; std::cout << '\n'; std::cout << "Contents of the list, shuffled: "; for(auto i: v) std::cout << *i << ' '; std::cout << '\n'; }
Instead of shuffling the list directly, a vector of iterators (with a std::vector
indices woud work as well) is shuffled and std::shuffle
only needs to swap iterators (/indices) rather than the more costly actual elements (in the example the "costly to swap" elements are just int
s).
For a std::list
I don't expect a big difference between iterating in order or iterating via shuffled iterators. On the other hand, for a std::vector
I do expect a significant impact. Hence, I would shuffle indices, then rearrange the vector once, and profile to see which performs better.
PS: As noted in comments, std::shuffle
is already the optimal algorithm to shuffle a range of elements. However, note that it swaps each element twice on average (possible implementation from cppreference):
for (diff_t i = n-1; i > 0; --i) { using std::swap; swap(first[i], first[D(g, param_t(0, i))]);
On the other hand, shuffling the indices and then rearranging the vector only requires to copy/move each element once (when additional memory is available).
Upvotes: 4
Reputation: 238401
Randomness won't guarantee that samplings don't get positions from the same game which you wanted to avoid. I propose following pseudo-shuffle that does prevent samplings from the same game (given sufficiently large population):
Upvotes: 2