Fardin Abdi
Fardin Abdi

Reputation: 1374

Fast pseudo shuffling of large arrays

I have a large array of samples that I want to feed into my deep learning model. The shuffling the array takes a long time. I don't need a perfectly random shuffle and given the nature of the problem I don't care about few collisions in the outcome. So, is there a pseudo-shuffling algorithm that is fast and memory efficient?

Upvotes: 0

Views: 238

Answers (1)

jspcal
jspcal

Reputation: 51904

Reservoir sampling algorithms are designed to efficiently sample from very large data sets that may not fit into memory. There's an implementation provided with TensorFlow:

https://github.com/tensorflow/tensorboard/blob/master/tensorboard/backend/event_processing/reservoir.py

Upvotes: 1

Related Questions