prl900
prl900

Reputation: 4179

Using numpy fromfile on binary file returns 1 dimension ndarray

I'm using numpy's fromfile function to read data from a binary file. The file contains a sequence of values (3 * float32, 3 * int8, 3 * float32) which I want to extract into a numpy ndarray with (rows, 9) shape.

with open('file/path', 'rb') as my_file:
    my_dtype = np.dtype('>f4, >f4, >f4, >i1, >i1, >i1, >f4, >f4, >f4' )
    my_array = np.fromfile( my_file, dtype = my_dtype )

    print(my_array.shape)
    print(type(my_array[0]))
    print(my_array[0])

And this returns:

(38475732,)
<type 'numpy.void'>
(-775.0602416992188, -71.0, -242.5240020751953, 39, 39, 39, 5.0, 2753.0, 15328.0)
  1. How can I get a 2 dimensional ndarray with shape (38475732, 9,)?

  2. Why the returned tuple is of type 'numpy.void'?

Redefining question:

If all the values that I want to read from the file were, for example, 4 byte floats I would use np.dtype('9>f4') and I would get what I need. But, as my binary file contains different types, is there a way of casting all the values into 32bit floats?

PS: I can do this using 'struct' to parse the binary file into a list and converting this list into an ndarray afterwards, but this method is much slower than using np.fromfile

Solution:

Thanks Hpaulj for your answer! What I did in my code was to add the following line to do the conversion from the recarray returned by the numpy fromfile function to the expected ndarray:

my_array = my_array.astype('f4, f4, f4, f4, f4, f4, f4, f4, f4').view(dtype='f4').reshape(my_array.shape[0], 9)

Which returns a (38475732, 9) ndarray

Cheers!

Upvotes: 3

Views: 5359

Answers (2)

hpaulj
hpaulj

Reputation: 231325

What is my_array[[0]]? my_array is a 1d array of records defined by my_dtype.

my_array[0] is one of those records, a tuple. Notice that some entries are float, some integers. If it was a row of a 2d array, all entries would be of the same type (e.g. float).

To convert it to a 2d array of floats, you might try:

np.array(my_array.tolist())

Another way is to convert all the fields to the same type, and reshape it. Something along this line (tested on a different recarray):

x = array([(1.0, 2), (3.0, 4)], dtype=[('x', '<f8'), ('y', '<i4')])
x.astype([('x', '<f8'), ('y', '<f8')]).view(dtype='f8').reshape(2,2)

See also: How to convert numpy.recarray to numpy.array?

Upvotes: 2

alko
alko

Reputation: 48287

Since you require your array to contain different datatypese, you get a structured array, where each element is a record. You can access fields with

>>> my_array.dtype.names
('f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8')
>>> my_array[0]['f1']
-71.0
>>> my_array['f1']
array([-71.], dtype=float32)

A basic ndarray contains elements of same type, if you need a ndarray with shape (38475732, 9,), you have to convert your array to, say, floats. See link above.

Can't say exactly why (didn't use structured arrays much), but reason for numpy.void is that your custom type, known to array, is not broadcasted to records. But what would be type of subrecord?

>>> arr[['f0','f1']][0]
(-775.0602416992188, -71.0)

Upvotes: 0

Related Questions