Why octave error with function huffmandeco about large index types?

Question

I've got a little MatLab script, which I try to understand. It doesn't do very much. It only reads a text from a file and encode and decode it with the Huffman-functions. But it throws an error while decoding:

"error: out of memory or dimension too large for Octave's index type
error: called from huffmandeco>dict2tree at line 95 column 19"

I don't know why, because I debugged it and don't see a large index type.

I added the part which calculates p from the input text.

%text is a random input text file in ASCII

%calculate the relative frequency of every Symbol
for i=0:127
    nlet=length(find(text==i));
    p(i+1)=nlet/length(text);
end
symb = 0:127;
dict = huffmandict(symb,p); % Create dictionary
compdata = huffmanenco(fdata,dict); % Encode the data
dsig = huffmandeco(compdata,dict); % Decode the Huffman code

I can oly use octave instead of MatLab. I don't know, if there is an unexpected error. I use the Octave Version 6.2.0 on Win10. I tried the version for large data, it didn't change anything.
Maybe anyone knows the error in this context?

EDIT: I debugged the code again. In the function huffmandeco I found the following function:

function tree = dict2tree (dict)

  L = length (dict);
  lengths = zeros (1, L);

  ## the depth of the tree is limited by the maximum word length.
  for i = 1:L
    lengths(i) = length (dict{i});
  endfor
  m = max (lengths);

  tree = zeros (1, 2^(m+1)-1)-1;

  for i = 1:L
    pointer = 1;
    word    = dict{i};
    for bit = word
      pointer = 2 * pointer + bit;
    endfor
    tree(pointer) = i;
  endfor

endfunction

The maximum length m in this case is 82. So the function calculates:
tree = zeros (1, 2^(82+1)-1)-1.
So it's obvious why the error called a too large index type.
But there must be a solution or another error, because the code is tested before.

beaker · Accepted Answer

I haven't weeded through the code enough to know why yet, but huffmandict is not ignoring zero-probability symbols the way it claims to. Nor have I been able to find a bug report on Savannah, but again I haven't searched thoroughly.

A workaround is to limit the symbol list and their probabilities to only the symbols that actually occur. Using containers.Map would be ideal, but in Octave you can do that with a couple of the outputs from unique:

% Create a symbol table of the unique characters in the input string
% and the indices into the table for each character in the string.
[symbols, ~, inds] = unique(textstr);
inds = inds.';   % just make it easier to read

For the string

textstr = 'Random String Input.';

the result is:

>> symbols
symbols =  .IRSadgimnoprtu
>> inds
inds =
 Columns 1 through 19:
    4    6   11    7   12   10    1    5   15   14    9   11    8    1    3   11   13   16   15
 Column 20:
    2

So the first symbol in the input string is symbols(4), the second is symbols(6), and so on.

From there, you just use symbols and inds to create the dictionary and encode/decode the signal. Here's a quick demo script:

textstr = 'Random String Input.';
fprintf("Starting string: %s
", textstr);

% Create a symbol table of the unique characters in the input string
% and the indices into the table for each character in the string.
[symbols, ~, inds] = unique(textstr);
inds = inds.';   % just make it easier to read

% Calculate the frequency of each symbol in table
% max(inds) == numel(symbols)
p = histc(inds, 1:max(inds))/numel(inds);

dict = huffmandict(symbols, p);
compdata = huffmanenco(inds, dict);
dsig = huffmandeco(compdata, dict);

fprintf("Decoded string: %s
", symbols(dsig));

And the output:

Starting string: Random String Input.
Decoded string: Random String Input.

To encode strings other than the original input string, you would have to map the characters to symbol indices (ensuring that all symbols in the string are actually present in the symbol table, obviously):

>> [~, s_idx] = ismember('trogdor', symbols)
s_idx =
   15   14   12    8    7   12   14

>> compdata = huffmanenco(s_idx, dict);
>> dsig = huffmandeco(compdata, dict);
>> fprintf("Decoded string: %s
", symbols(dsig));
Decoded string: trogdor

Why octave error with function huffmandeco about large index types?

Answers (1)

Related Questions