Reputation: 6029
What's the best way to convert numpy's recarray
to a normal array?
i could do a .tolist()
first and then do an array()
again, but that seems somewhat inefficient..
Example:
import numpy as np
a = np.recarray((2,), dtype=[('x', int), ('y', float), ('z', int)])
>>> a
rec.array([(30408891, 9.2944097561804909e-296, 30261980),
(44512448, 4.5273310988985789e-300, 29979040)],
dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])
>>> np.array(a.tolist())
array([[ 3.04088910e+007, 9.29440976e-296, 3.02619800e+007],
[ 4.45124480e+007, 4.52733110e-300, 2.99790400e+007]])
Upvotes: 22
Views: 14661
Reputation: 7729
Here is a relatively clean solution using pandas
:
>>> import numpy as np
>>> import pandas as pd
>>> a = np.recarray((2,), dtype=[('x', int), ('y', float), ('z', int)])
>>> arr = pd.DataFrame(a).to_numpy()
>>> arr
array([[9.38925058e+013, 0.00000000e+000, 1.40380704e+014],
[1.40380704e+014, 6.93572751e-310, 1.40380484e+014]])
>>> arr.shape
(2, 3)
>>> arr.dtype
dtype('float64')
First the data from the recarray
are loaded into a pd.DataFrame
, then the data are exported using the DataFrame.to_numpy
method. As we can see, this method call has automatically converted all of the data to type float64
.
Upvotes: 3
Reputation: 880359
By "normal array" I take it you mean a NumPy array of homogeneous dtype. Given a recarray, such as:
>>> a = np.array([(0, 1, 2),
(3, 4, 5)],[('x', int), ('y', float), ('z', int)]).view(np.recarray)
rec.array([(0, 1.0, 2), (3, 4.0, 5)],
dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])
we must first make each column have the same dtype. We can then convert it to a "normal array" by viewing the data by the same dtype:
>>> a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
array([ 0., 1., 2., 3., 4., 5.])
astype returns a new numpy array. So the above requires additional memory in an amount proportional to the size of a
. Each row of a
requires 4+8+4=16 bytes, while a.astype(...)
requires 8*3=24 bytes. Calling view requires no new memory, since view
just changes how the underlying data is interpreted.
a.tolist()
returns a new Python list. Each Python number is an object which requires more bytes than its equivalent representation in a numpy array. So a.tolist()
requires more memory than a.astype(...)
.
Calling a.astype(...).view(...)
is also faster than np.array(a.tolist())
:
In [8]: a = np.array(zip(*[iter(xrange(300))]*3),[('x', int), ('y', float), ('z', int)]).view(np.recarray)
In [9]: %timeit a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
10000 loops, best of 3: 165 us per loop
In [10]: %timeit np.array(a.tolist())
1000 loops, best of 3: 683 us per loop
Upvotes: 18