Reputation: 1374
I have a large array of samples that I want to feed into my deep learning model. The shuffling the array takes a long time. I don't need a perfectly random shuffle and given the nature of the problem I don't care about few collisions in the outcome. So, is there a pseudo-shuffling algorithm that is fast and memory efficient?
Upvotes: 0
Views: 238
Reputation: 51904
Reservoir sampling algorithms are designed to efficiently sample from very large data sets that may not fit into memory. There's an implementation provided with TensorFlow:
Upvotes: 1