Jonas Adler
Jonas Adler

Reputation: 10759

Generate large random text files with python and NumPy

For testing data, I am in need of quickly creating large files of random text. I have one solution, taken from here and given below:

import random
import string

n = 1024 ** 2  # 1 Mb of text
chars = ''.join([random.choice(string.letters) for i in range(n)])

with open('textfile.txt', 'w+') as f:
    f.write(chars)

My problem is that this takes 653 ms to perform, way too much for my uses.

Is there a faster way to quickly generate text files with random text?

Upvotes: 2

Views: 8126

Answers (1)

cs95
cs95

Reputation: 402463

Create a numpy array of letters:

In [662]: letters = np.array(list(chr(ord('a') + i) for i in range(26))); letters
Out[662]: 
array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
       'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'],
      dtype='<U1')

Use np.random.choice to generate random indices b/w 0 and 26, and index letters to generate random text:

np.random.choice(letters, n)

Timings:

In [664]: n = 1024 ** 2

In [701]: %timeit np.random.choice(letters, n)
100 loops, best of 3: 15.1 ms per loop

Alternatively,

In [705]: %timeit np.random.choice(np.fromstring(letters, dtype='<U1'), n)
100 loops, best of 3: 14.1 ms per loop

Upvotes: 2

Related Questions