Reputation: 1910
I have a binary test file found at http://jmp.sh/VpTZxgQ and I am trying to rewrite some matlab code in python which reads this file.
What I have realised is that matlab's fread
remembers what has already been read so that it skips the number of bytes that have already been read. How do I ensure I get the same behaviour in python?
clear all; close all;
path = pwd;
ext = 'bin';
stem = 'test';
filename = [stem,'.',ext];
filename = fullfile(path,filename);
fid = fopen(filename,'r');
fread(fid,2,'int16')
fread(fid,32,'char')
fread(fid,2,'int16')
import numpy as np
def fread(filename, n, precision):
with open(filename, 'rb') as fid:
data_array = np.fromfile(fid, precision).reshape((-1, 1)).T
return data_array[0,0:n]
print fread('test.bin', 2, np.int16)
print fread('test.bin', 32, np.str)
print fread('test.bin', 2, np.int16)
Ideally I would want the output of these formulations to be the same, but they are not. In fact python gives a value error
when I try to set precision
to np.str
...
As a bonus question - I'm assuming that reading a binary file and making sense of the data requires that the user has an understanding of how the data was formatted in order to make any sensible information of the data. Is this true?
Upvotes: 3
Views: 3203
Reputation: 2432
As the comments suggest, you need to use a file descriptor, which is what the Matlab code is doing:
import numpy as np
def fread(fid, nelements, dtype):
if dtype is np.str:
dt = np.uint8 # WARNING: assuming 8-bit ASCII for np.str!
else:
dt = dtype
data_array = np.fromfile(fid, dt, nelements)
data_array.shape = (nelements, 1)
return data_array
fid = open('test.bin', 'rb');
print fread(fid, 2, np.int16)
print fread(fid, 32, np.str)
print fread(fid, 2, np.int16)
Reading & Writing data to a file in binary requires the reader and writer to agree on a specified format. As the commenters suggest, endianess may become an issue if you save the binary on one computer and try to read it on another. If the data is always written and read on the same CPU, then you won't run into the issue.
Output for the test.bin:
MATLAB Output Python+Numpy Output
------------------------------------------------------
ans =
32 [[32]
0 [ 0]]
ans =
35 [[ 35]
32 [ 32]
97 [ 97]
102 [102]
48 [ 48]
52 [ 52]
50 [ 50]
95 [ 95]
53 [ 53]
48 [ 48]
112 [112]
101 [101]
114 [114]
99 [ 99]
95 [ 95]
115 [115]
112 [112]
97 [ 97]
110 [110]
32 [ 32]
32 [ 32]
32 [ 32]
32 [ 32]
32 [ 32]
32 [ 32]
32 [ 32]
32 [ 32]
32 [ 32]
32 [ 32]
32 [ 32]
32 [ 32]
32 [ 32]]
ans =
32 [[32]
0 [ 0]]
Upvotes: 6