Reputation: 1015
I have to process thousands of binary files (each of 16MB) by reading pairs of them and creating a bit-level data structure (usually a 1x134217728 array) in order to process them on bit level.
Currently I am doing this the following way:
conv = @(c) uint8(bitget(c,1:32));
measurement = NaN(1,(sizeOfMeasurements*8)) %(1,134217728)
fid = fopen(fileName, 'rb');
byteContent = fread(fid,'uint32');
fclose(fid);
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
measurement=[bitRepresentation1{:}];
Thus, I replaced fopen
with memmapfile
as below:
m = memmapfile(fileName,'Format',{'uint32', [4194304 1], 'byteContent'});
byteContent = m.data.byteContent;
byteContent = double(byteContent);
I printed timing information (using tic
/toc
) for the individual instructions and it turns out that the bottleneck is:
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false); % see first line of code for conv
Are there more efficient ways of transforming byteContent
into an array that stores a bit per index (i.e. that is a bit representation of byteContent
)?
Upvotes: 2
Views: 114
Reputation: 24169
Several things that seem to provide further improvement on Rody's suggestion:
conv
. conv
to logical
using ~~
instead of uint8
. cell2mat
instead of [bitRepresentation{:}]'
.The result:
function q40863898(filename)
fid = fopen(filename, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);
bitRepresentation = arrayfun(@convert, 1:64, 'UniformOutput', false);
measurement = reshape(cell2mat(bitRepresentation).',[],1).';
function out = convert(ii)
out = ~~(bitget(bitContent, ii, 'uint64'));
end
end
Rody's vectorized method: 0.87783 Rody's loop method: 2.37 Dev-iL's method: 0.68387
function q40863898(filename)
%% Common code:
fid = fopen(filename, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);
%% Verification:
ref = Rody1();
res = {Rody2(), uint8(Devil1())};
assert(isequal(ref,res{1}));
assert(isequal(ref,res{2}));
%% Benchmark:
disp(['Rody''s vectorized method: ' num2str(timeit(@Rody1))]);
disp(['Rody''s loop method: ' num2str(timeit(@Rody2))]);
disp(['Dev-iL''s method: ' num2str(timeit(@Devil1))]);
%% Functions:
function measurement = Rody1()
conv = @(ii) uint8(bitget(bitContent, ii));
bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);
measurement = [bitRepresentation{:}]';
measurement = measurement(:).';
end
function measurement = Rody2()
sz = 64 * size(bitContent,1);
measurement = zeros(1, sz, 'uint8');
weave = 1:64:sz;
for ii = 1:64
measurement(weave + ii - 1) = uint8(bitget(bitContent, ii));
end
end
function measurement = Devil1()
bitRepresentation = arrayfun(@convert, 1:64, 'UniformOutput', false);
measurement = reshape(cell2mat(bitRepresentation).',[],1).';
function out = convert(ii)
out = ~~(bitget(bitContent, ii, 'uint64'));
end
end
end
Upvotes: 2
Reputation: 38042
Let looping over all numbers be handled by bitget
. You loop over the bits:
fid = fopen(fileName, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);
conv = @(ii) uint8(bitget(bitContent, ii));
bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);
measurement = [bitRepresentation{:}]';
measurement = measurement(:).';
EDIT you can also try a direct loop:
fid = fopen(fileName, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);
sz = 64 * size(bitContent,1);
measurement3 = zeros(1, sz, 'uint8');
weave = 1:64:sz;
for ii = 1:64
measurement3(weave + ii - 1) = uint8(bitget(bitContent, ii)); end
but on my system, that is (surprisingly) slower than arrayfun
...but, my MATLAB version is from the stone age, your mileage may be different. Give it a try
Upvotes: 5