michalkvasnicka
michalkvasnicka

Reputation: 119

Matlab integer strings decoding ... speed optimization

I have the following problem:

I need decode integer sequences "c" to char string messages "m" by following association:

  numpos = 10 % ( = size(c,2)/2)
  c = [3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3]

Each row of "c" represents 2*numpos integers, where first numpos parameters encoded position of

types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'} 

and second numpos parameters are applied only if type contains character '@' like this:

  m = ' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6' 

My current solution is as follows:

  function m = c2m(c,types)

  numpos = size(c,2)/2;

  F = cellfun(@(f) [' ' f], strrep(types,'@',':%d@'),'unif',0);
  m = arrayfun(@(f,k) sprintf(f{1},k),F(c(:,1:numpos)),c(:,numpos+(1:numpos)),'unif', 0);
  m = arrayfun(@(i) horzcat(m{i,:}), (1:numlines)', 'unif', 0)

  end

and the testing code is as follows:

  numlines = 10;
  c = repmat([3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3],numlines,1);
  types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'};

  m = c2m(c,types);

  m =

    10×1 cell array

      {' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
      {' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
      {' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
      {' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
      {' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
      {' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
      {' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
      {' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
      {' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
      {' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}

The code is still too slow for me, I am looking for any speed up. In this case the most significant fraction of CPU time is spent at built-in function "sprintf".

Typical realistic sizes of problem are:

   numpos ~ 30 ... 60
   numlines ~ 1e4 ... 1e5

Any idea?

Upvotes: 3

Views: 101

Answers (3)

matlabbit
matlabbit

Reputation: 706

In 16b MATLAB shipped some new text functions that make this easy. Also in 16b MATLAB shipped the new string datatype that makes this fast.

 function m = c2m_new(c,types, numlines)

     types = string(types);

     num_values = size(c,2)/2;

     a = c(:,1:num_values);
     b = c(:,(num_values+1):end);

     m = types(a);
     m = insertBefore(m,"@", ":" + b);
     m = join(m,2);
 end

>> numlines = 10;
>> c = repmat([3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3],numlines,1);
>> types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'};
>> c2m_new(c,types,numlines)

ans = 

  10×1 string array

    "c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
    "c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
    "c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
    "c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
    "c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
    "c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
    "c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
    "c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
    "c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
    "c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"

Looking at performance:

>> numlines = 1E4;
>> c = repmat([3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3],numlines,1);
>> types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'};

% My solution
>> tic; for i = 1:10; c2m_new(c,types, numlines); end; toc
Elapsed time is 0.669311 seconds.

% michalkvasnicka's solution
>> tic; for i = 1:10; c2m(c,types, numlines); end; toc
Elapsed time is 23.643991 seconds.

% gnovice's solution
>> tic; for i = 1:10; c2m_gnovice(c,types, numlines); end; toc
Elapsed time is 8.960392 seconds.

Upvotes: 2

gnovice
gnovice

Reputation: 125854

Here's an alternative for your c2m function that is 2 to 3 times faster for the typical ranges of numpos and numlines you list above:

function m = c2m(c, types)
  numpos = size(c, 2)/2;
  [pre, post] = strtok(types(c(:, 1:numpos)), '@');
  mid = strsplit(sprintf(' :%i', 1:max(max(c(:, numpos+1:2*numpos)))));
  mid = mid(c(:, numpos+1:2*numpos).*~cellfun(@isempty, post)+1);
  m = cellstr(char(join(strcat(pre, mid, post))));
end

First, the first half of c is split at the '@' using strtok. Then a cell array mid is created containing the strings {'' ':1' ':2' ... ':N'}, where N is the maximum value found in the second half of c. This allows us to avoid costly conversion functions applied to the whole matrix (like sprintf, num2str, int2str, etc.) by simply indexing into mid to get the string we want. The index is just the right half of c multiplied by a logical array representing whether '@' is present or not (gotten using cellfun) and incremented by 1.

Finally, the three different strings (pre, mid, and post) are concatenated using strcat, collected row-wise into strings using join (present since R2016b), then converted to a cell array of character arrays with cellstr and char.

Testing it with these value:

numpos = 10;
numlines = 10;
c = repmat([3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3], numlines, 1);
types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'};

We get the desired result:

m =

  10×1 cell array

    'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
    'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
    'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
    'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
    'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
    'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
    'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
    'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
    'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
    'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'

Upvotes: 0

marsei
marsei

Reputation: 7751

Here is an idea to start experimenting with.

numpos = 10 % ( = size(c,2)/2)
c = [3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3];
types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'} 

Then

types_mod = {'a@0' 'b@2' 'c@6' 'd@10' 'e@11'}'; % so that each item got a @
types_mod_split = split(types_mod,'@');
m = strcat(types_mod_split(c(1:10),1), repmat({':'},10,1), num2str(c(11:20)'), repmat({'@'},10,1), types_mod_split(c(1:10),2))'

which gives

m = 1×10 cell array Columns 1 through 10

{'c:1@6'} {'d:1@10'} {'a:1@0'} {'a:1@0'} {'d:2@10'} {'b:2@2'} {'e:2@11'} {'b:3@2'} {'c:3@6'} {'c:3@6'}

Upvotes: 0

Related Questions