Reputation: 4876
NumPy
arrays are great for both performance and easy use (easier slicing, indexing than lists).
I try to construct a data container out of a NumPy structured array
instead of dict
of NumPy arrays
. The problem is the performance is much worse. About 2.5 times as bad using homogeneous data and about 32 times for heterogeneous data (I'm talking about NumPy
datatypes).
Is there a way to speed the structured array's up? I tried changing the memoryorder from 'c' to 'f' but this didn't have any affect.
Here's my profiling code:
import time
import numpy as np
NP_SIZE = 100000
N_REP = 100
np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}
t0 = time.time()
for i in range(N_REP):
np_homo['a'] += i
t1 = time.time()
for i in range(N_REP):
np_hetro['a'] += i
t2 = time.time()
for i in range(N_REP):
dict_homo['a'] += i
t3 = time.time()
for i in range(N_REP):
dict_hetro['a'] += i
t4 = time.time()
print('Homogeneous Numpy struct array took {:.4f}s'.format(t1 - t0))
print('Hetoregeneous Numpy struct array took {:.4f}s'.format(t2 - t1))
print('Homogeneous Dict of numpy arrays took {:.4f}s'.format(t3 - t2))
print('Hetoregeneous Dict of numpy arrays took {:.4f}s'.format(t4 - t3))
Edit: Forgot to put my timing numbers:
Homogenious Numpy struct array took 0.0101s
Hetoregenious Numpy struct array took 0.1367s
Homogenious Dict of numpy arrays took 0.0042s
Hetoregenious Dict of numpy arrays took 0.0042s
Edit2: I added some additional test case with the timit module:
import numpy as np
import timeit
NP_SIZE = 1000000
def time(data, txt, n_rep=1000):
def intern():
data['a'] += 1
time = timeit.timeit(intern, number=n_rep)
print('{} {:.4f}'.format(txt, time))
np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}
time(np_homo, 'Homogeneous Numpy struct array')
time(np_hetro, 'Hetoregeneous Numpy struct array')
time(dict_homo, 'Homogeneous Dict of numpy arrays')
time(dict_hetro, 'Hetoregeneous Dict of numpy arrays')
results in:
Homogeneous Numpy struct array 0.7989
Hetoregeneous Numpy struct array 13.5253
Homogeneous Dict of numpy arrays 0.3750
Hetoregeneous Dict of numpy arrays 0.3744
The ratios between the runs seem reasonably stable. Using both methods and a different size of the array.
For the offcase it matters: python: 3.4 NumPy: 1.9.2
Upvotes: 11
Views: 3492
Reputation: 231738
In my quick timing tests the difference isn't that large:
In [717]: dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)}
In [718]: timeit dict_homo['a']+=1
10000 loops, best of 3: 25.9 µs per loop
In [719]: np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)])
In [720]: timeit np_homo['a'] += 1
10000 loops, best of 3: 29.3 µs per loop
In the dict_homo
case, the fact that the array is embedded in a dictionary is a minor point. Simple dictionary access like this is fast, basically the same as accessing the array by variable name.
So the first case it basically a test of +=
for a 1d array.
In the structured case, the a
and b
values alternate in the data buffer, so np_homo['a']
is a view that 'pulls out' alternative numbers. So it's not surprising that it would be a bit slower.
In [721]: np_homo
Out[721]:
array([(41111.0, 0.0), (41111.0, 0.0), (41111.0, 0.0), ..., (41111.0, 0.0),
(41111.0, 0.0), (41111.0, 0.0)],
dtype=[('a', '<f8'), ('b', '<f8')])
A 2d array also interleaves the column values.
In [722]: np_twod=np.zeros((10000,2), np.double)
In [723]: timeit np_twod[:,0]+=1
10000 loops, best of 3: 36.8 µs per loop
Surprisingly it's actually a bit slower than the structured case. Using order='F'
or (2,10000) shape speeds it up a bit, but still not quite as good as the structured case.
These are small test times, so I won't make grand claims. But the structured array doesn't look back.
Another time tests, initializing the array or dictionary fresh each step
In [730]: %%timeit np.twod=np.zeros((10000,2), np.double)
np.twod[:,0] += 1
.....:
10000 loops, best of 3: 36.7 µs per loop
In [731]: %%timeit np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)])
np_homo['a'] += 1
.....:
10000 loops, best of 3: 38.3 µs per loop
In [732]: %%timeit dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)}
dict_homo['a'] += 1
.....:
10000 loops, best of 3: 25.4 µs per loop
2d and structured are closer, with somewhat better performance for the dictionary (1d) case. I tried this with np.ones
as well, since np.zeros
can have delayed allocation, but no difference in behavior.
Upvotes: 3