user1192748
user1192748

Reputation: 1015

Speedup processing of larger binary files

I have to process thousands of binary files (each of 16MB) by reading pairs of them and creating a bit-level data structure (usually a 1x134217728 array) in order to process them on bit level.

Currently I am doing this the following way:

conv = @(c) uint8(bitget(c,1:32));
measurement = NaN(1,(sizeOfMeasurements*8))   %(1,134217728)
fid = fopen(fileName, 'rb');
byteContent = fread(fid,'uint32');
fclose(fid);
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
measurement=[bitRepresentation1{:}];

Thus, I replaced fopen with memmapfile as below:

m = memmapfile(fileName,'Format',{'uint32', [4194304 1], 'byteContent'});
byteContent = m.data.byteContent;
byteContent = double(byteContent);

I printed timing information (using tic/toc) for the individual instructions and it turns out that the bottleneck is:

bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);  % see first line of code for conv

Are there more efficient ways of transforming byteContent into an array that stores a bit per index (i.e. that is a bit representation of byteContent)?

Upvotes: 2

Views: 114

Answers (2)

Dev-iL
Dev-iL

Reputation: 24169

Several things that seem to provide further improvement on Rody's suggestion:

  1. (minor:) Using a local function instead of a function handle for conv.
  2. (major:) Converting the result of conv to logical using ~~ instead of uint8.
  3. (major:) cell2mat instead of [bitRepresentation{:}]'.

The result:

function q40863898(filename)

  fid = fopen(filename, 'rb');
  bitContent = fread(fid,'*ubit64');
  fclose(fid);

  bitRepresentation = arrayfun(@convert, 1:64, 'UniformOutput', false);    
  measurement = reshape(cell2mat(bitRepresentation).',[],1).';

  function out = convert(ii)
    out = ~~(bitget(bitContent, ii, 'uint64'));
  end

end

Benchmark result (on MATLAB R2016b, Win10 x64, 14MB file):

Rody's vectorized method: 0.87783
Rody's loop method: 2.37
Dev-iL's method: 0.68387

Benchmark code:

function q40863898(filename)
  %% Common code:
  fid = fopen(filename, 'rb');
  bitContent = fread(fid,'*ubit64');
  fclose(fid);
  %% Verification:
  ref = Rody1();
  res = {Rody2(), uint8(Devil1())};  
  assert(isequal(ref,res{1}));
  assert(isequal(ref,res{2}));
  %% Benchmark: 
  disp(['Rody''s vectorized method: ' num2str(timeit(@Rody1))]);
  disp(['Rody''s loop method: ' num2str(timeit(@Rody2))]);
  disp(['Dev-iL''s method: ' num2str(timeit(@Devil1))]);
  %% Functions:
  function measurement = Rody1()
    conv = @(ii) uint8(bitget(bitContent, ii));
    bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);
    measurement = [bitRepresentation{:}]';
    measurement = measurement(:).';    
  end

  function measurement = Rody2()
    sz = 64 * size(bitContent,1);    
    measurement = zeros(1, sz, 'uint8');
    weave = 1:64:sz;
    for ii = 1:64
        measurement(weave + ii - 1) = uint8(bitget(bitContent, ii));
    end    
  end

  function measurement = Devil1()
    bitRepresentation = arrayfun(@convert, 1:64, 'UniformOutput', false);
    measurement = reshape(cell2mat(bitRepresentation).',[],1).';

    function out = convert(ii)
      out = ~~(bitget(bitContent, ii, 'uint64'));
    end
  end

end

Upvotes: 2

Rody Oldenhuis
Rody Oldenhuis

Reputation: 38042

Let looping over all numbers be handled by bitget. You loop over the bits:

fid = fopen(fileName, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);

conv = @(ii) uint8(bitget(bitContent, ii));
bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);

measurement = [bitRepresentation{:}]';
measurement = measurement(:).';

EDIT you can also try a direct loop:

fid = fopen(fileName, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);

sz = 64 * size(bitContent,1);    
measurement3 = zeros(1, sz, 'uint8');
weave = 1:64:sz;
for ii = 1:64
    measurement3(weave + ii - 1) = uint8(bitget(bitContent, ii)); end

but on my system, that is (surprisingly) slower than arrayfun...but, my MATLAB version is from the stone age, your mileage may be different. Give it a try

Upvotes: 5

Related Questions