Reputation: 96398
I have a Python 3 dictionary holding very long lists (30 million integers each). I would like to stitch all these lists into a single numpy array. How can I do this efficiently?
The following
np.array(my_dict.values())
doesn't seem to work (I get array(dict_values([[...], [....]))
as opposed to a flat 1D numpy array).
Upvotes: 5
Views: 23285
Reputation: 5273
try this
list(my_dict.values())
(commentary added)
dict.values()
returns view
not a list. Refer here
Upvotes: 6
Reputation: 180482
from itertools import chain
import numpy as np
chn = chain.from_iterable(d.values())
np.array(list(chn))
Upvotes: 4
Reputation: 58985
In order to get an ordered concatenation based on the keys:
np.array([d[k] for k in sorted(d.keys())]).flatten()
if you don't need any order based on the keys, @Padraic Cunningham's approach was the fastest based on my timings here...
Upvotes: 1
Reputation: 176978
If you're looking for a flat 1d array, you could just use np.concatenate
:
>>> d = {'a': [1, 2, 3, 4, 5], 'b': [1, 2, 3, 4, 5], 'c': [1, 2, 3, 4, 5]}
>>> np.concatenate(list(d.values()))
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
Upvotes: 9
Reputation: 7203
Allocate numpy arrays ahead of time:
my_dict = {0:[0,3,2,1], 1:[4,2,1,3], 2:[3,4,2,1]}
array = numpy.ndarray((len(my_dict), len(my_dict.values()[0]))
then you can insert them into the array like so:
for index, val in enumerate(my_dict.values()):
arr[index] = val
>>> arr
array([[ 0., 3., 2., 1.],
[ 4., 2., 1., 3.],
[ 3., 4., 2., 1.]])
Upvotes: 1