Reputation: 4179
I'm using numpy's fromfile function to read data from a binary file. The file contains a sequence of values (3 * float32, 3 * int8, 3 * float32) which I want to extract into a numpy ndarray with (rows, 9) shape.
with open('file/path', 'rb') as my_file:
my_dtype = np.dtype('>f4, >f4, >f4, >i1, >i1, >i1, >f4, >f4, >f4' )
my_array = np.fromfile( my_file, dtype = my_dtype )
print(my_array.shape)
print(type(my_array[0]))
print(my_array[0])
And this returns:
(38475732,)
<type 'numpy.void'>
(-775.0602416992188, -71.0, -242.5240020751953, 39, 39, 39, 5.0, 2753.0, 15328.0)
How can I get a 2 dimensional ndarray with shape (38475732, 9,)?
Why the returned tuple is of type 'numpy.void'?
Redefining question:
If all the values that I want to read from the file were, for example, 4 byte floats I would use np.dtype('9>f4') and I would get what I need. But, as my binary file contains different types, is there a way of casting all the values into 32bit floats?
PS: I can do this using 'struct' to parse the binary file into a list and converting this list into an ndarray afterwards, but this method is much slower than using np.fromfile
Solution:
Thanks Hpaulj for your answer! What I did in my code was to add the following line to do the conversion from the recarray returned by the numpy fromfile function to the expected ndarray:
my_array = my_array.astype('f4, f4, f4, f4, f4, f4, f4, f4, f4').view(dtype='f4').reshape(my_array.shape[0], 9)
Which returns a (38475732, 9) ndarray
Cheers!
Upvotes: 3
Views: 5359
Reputation: 231325
What is my_array[[0]]
? my_array
is a 1d array of records defined by my_dtype
.
my_array[0]
is one of those records, a tuple. Notice that some entries are float, some integers. If it was a row of a 2d array, all entries would be of the same type (e.g. float).
To convert it to a 2d array of floats, you might try:
np.array(my_array.tolist())
Another way is to convert all the fields to the same type, and reshape it. Something along this line (tested on a different recarray):
x = array([(1.0, 2), (3.0, 4)], dtype=[('x', '<f8'), ('y', '<i4')])
x.astype([('x', '<f8'), ('y', '<f8')]).view(dtype='f8').reshape(2,2)
See also: How to convert numpy.recarray to numpy.array?
Upvotes: 2
Reputation: 48287
Since you require your array to contain different datatypese, you get a structured array, where each element is a record. You can access fields with
>>> my_array.dtype.names
('f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8')
>>> my_array[0]['f1']
-71.0
>>> my_array['f1']
array([-71.], dtype=float32)
A basic ndarray
contains elements of same type, if you need a ndarray with shape (38475732, 9,), you have to convert your array to, say, floats. See link above.
Can't say exactly why (didn't use structured arrays much), but reason for numpy.void
is that your custom type, known to array, is not broadcasted to records. But what would be type of subrecord?
>>> arr[['f0','f1']][0]
(-775.0602416992188, -71.0)
Upvotes: 0