Reputation: 119
I have the following problem:
I need decode integer sequences "c" to char string messages "m" by following association:
numpos = 10 % ( = size(c,2)/2)
c = [3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3]
Each row of "c" represents 2*numpos integers, where first numpos parameters encoded position of
types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'}
and second numpos parameters are applied only if type contains character '@' like this:
m = ' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
My current solution is as follows:
function m = c2m(c,types)
numpos = size(c,2)/2;
F = cellfun(@(f) [' ' f], strrep(types,'@',':%d@'),'unif',0);
m = arrayfun(@(f,k) sprintf(f{1},k),F(c(:,1:numpos)),c(:,numpos+(1:numpos)),'unif', 0);
m = arrayfun(@(i) horzcat(m{i,:}), (1:numlines)', 'unif', 0)
end
and the testing code is as follows:
numlines = 10;
c = repmat([3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3],numlines,1);
types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'};
m = c2m(c,types);
m =
10×1 cell array
{' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
{' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
{' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
{' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
{' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
{' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
{' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
{' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
{' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
{' c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'}
The code is still too slow for me, I am looking for any speed up. In this case the most significant fraction of CPU time is spent at built-in function "sprintf".
Typical realistic sizes of problem are:
numpos ~ 30 ... 60
numlines ~ 1e4 ... 1e5
Any idea?
Upvotes: 3
Views: 101
Reputation: 706
In 16b MATLAB shipped some new text functions that make this easy. Also in 16b MATLAB shipped the new string datatype that makes this fast.
function m = c2m_new(c,types, numlines)
types = string(types);
num_values = size(c,2)/2;
a = c(:,1:num_values);
b = c(:,(num_values+1):end);
m = types(a);
m = insertBefore(m,"@", ":" + b);
m = join(m,2);
end
>> numlines = 10;
>> c = repmat([3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3],numlines,1);
>> types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'};
>> c2m_new(c,types,numlines)
ans =
10×1 string array
"c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
"c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
"c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
"c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
"c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
"c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
"c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
"c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
"c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
"c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6"
Looking at performance:
>> numlines = 1E4;
>> c = repmat([3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3],numlines,1);
>> types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'};
% My solution
>> tic; for i = 1:10; c2m_new(c,types, numlines); end; toc
Elapsed time is 0.669311 seconds.
% michalkvasnicka's solution
>> tic; for i = 1:10; c2m(c,types, numlines); end; toc
Elapsed time is 23.643991 seconds.
% gnovice's solution
>> tic; for i = 1:10; c2m_gnovice(c,types, numlines); end; toc
Elapsed time is 8.960392 seconds.
Upvotes: 2
Reputation: 125854
Here's an alternative for your c2m
function that is 2 to 3 times faster for the typical ranges of numpos
and numlines
you list above:
function m = c2m(c, types)
numpos = size(c, 2)/2;
[pre, post] = strtok(types(c(:, 1:numpos)), '@');
mid = strsplit(sprintf(' :%i', 1:max(max(c(:, numpos+1:2*numpos)))));
mid = mid(c(:, numpos+1:2*numpos).*~cellfun(@isempty, post)+1);
m = cellstr(char(join(strcat(pre, mid, post))));
end
First, the first half of c
is split at the '@'
using strtok
. Then a cell array mid
is created containing the strings {'' ':1' ':2' ... ':N'}
, where N
is the maximum value found in the second half of c
. This allows us to avoid costly conversion functions applied to the whole matrix (like sprintf
, num2str
, int2str
, etc.) by simply indexing into mid
to get the string we want. The index is just the right half of c
multiplied by a logical array representing whether '@'
is present or not (gotten using cellfun
) and incremented by 1.
Finally, the three different strings (pre
, mid
, and post
) are concatenated using strcat
, collected row-wise into strings using join
(present since R2016b), then converted to a cell array of character arrays with cellstr
and char
.
Testing it with these value:
numpos = 10;
numlines = 10;
c = repmat([3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3], numlines, 1);
types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'};
We get the desired result:
m =
10×1 cell array
'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
'c:1@6 d:1@10 a a d:2@10 b:2@2 e:2@11 b:3@2 c:3@6 c:3@6'
Upvotes: 0
Reputation: 7751
Here is an idea to start experimenting with.
numpos = 10 % ( = size(c,2)/2)
c = [3 4 1 1 4 2 5 2 3 3,1 1 1 1 2 2 2 3 3 3];
types = {'a' 'b@2' 'c@6' 'd@10' 'e@11'}
Then
types_mod = {'a@0' 'b@2' 'c@6' 'd@10' 'e@11'}'; % so that each item got a @
types_mod_split = split(types_mod,'@');
m = strcat(types_mod_split(c(1:10),1), repmat({':'},10,1), num2str(c(11:20)'), repmat({'@'},10,1), types_mod_split(c(1:10),2))'
which gives
m = 1×10 cell array Columns 1 through 10
{'c:1@6'} {'d:1@10'} {'a:1@0'} {'a:1@0'} {'d:2@10'} {'b:2@2'} {'e:2@11'} {'b:3@2'} {'c:3@6'} {'c:3@6'}
Upvotes: 0