Pastafarianist
Pastafarianist

Reputation: 893

Allocating large arrays in memory with Python

The code

import array, itertools
a = array.array('B', itertools.repeat(0, 3715948544))

takes almost 7 minutes to run on my machine (6m44s). The computer has 8 Gb of RAM and runs Linux with CPython 3.4.3. How can I obtain an array-like object with 1-byte unsigned int entries faster, preferably using the Python standard library? Numpy can allocate it instantly (in less than 1 millisecond).

Upvotes: 6

Views: 3717

Answers (3)

Sven Marnach
Sven Marnach

Reputation: 602715

If you really can't use NumPy, you can try how far you can get with the built-in bytearray:

a = bytearray(3715948544)

This should finish in a couple of seconds at most.

Upvotes: 4

wflynny
wflynny

Reputation: 18551

At first I thought numpy would be fastest, but as pointed out by Sven, bytearray is pretty quick for 10000. Try your luck with bytearray on 3billion.

In [1]: import numpy as np

In [2]: import array, itertools

In [3]: %timeit array.array('B', itertools.repeat(0, 10000))
1000 loops, best of 3: 456 µs per loop

In [4]: %timeit np.zeros(10000, dtype='uint8')
1000000 loops, best of 3: 924 ns per loop

In [5]: %timeit bytearray(10000)
1000000 loops, best of 3: 328 ns per loop

Upvotes: 1

user2357112
user2357112

Reputation: 282026

a = array.array('B', [0]) * 3715948544

Sequence multiplication, analogous to how you'd create a giant list of zeros. Note that anything you want to do with this giant array is probably going to be as slow as your initial attempt to create it.

Upvotes: 6

Related Questions