Lithium
Lithium

Reputation: 223

Reading in binary file with dtype causes ValueError

I have a binary file that I want to read in with a python routine. In order to do so a dtype object is created, which describes how the data looks like. The dtype object that should be created is a dictionary of the form {'field1': ..., 'field2': ..., ...}. The obj then is a tuple of (data-type, offset) - (see numpy documentation). The error now occurs if offset exceeds the range of a C int during creation of the dtype.

A minimal example to reproduce the error:

dict_tmp = dict()
offset = 2281832888
dict_tmp['/timedisc/pressure'] = ('(4096, 4096)>f8', offset)
dtype = np.dtype(dict_tmp)

ValueError: integer won't fit into a C int

If I reduce the offset below the range of a 32bit integer the error vanishes of course. I already tried to cast the offset value to an int64 or uint32 by hand, but this was also not working. As far as I can see the dtype is part of multiarray in numpy and at this point I am a bit lost.

Is there any possibility to load the data and circumvent the error?

Upvotes: 2

Views: 551

Answers (1)

gzahl
gzahl

Reputation: 36

The dtypes are indeed limited to int32 offsets (e.g. < 2^31, see also https://github.com/numpy/numpy/issues/11869#issuecomment-418330815) I guess you want to use this dtype to read from a file using numpy memmap. This can be achieved with this snippet:

f = np.memmap(file)
arr1 = np.ndarray(buffer=f, dtype=np.dtype('<f8'), shape=(4096,4096), offset=2281832888)

Constructing a ndarray like this is actually what memmap internally does, but in this solution the dtype does not have to store the offset, but it is directly passed to the ndarray constructor.

Upvotes: 2

Related Questions