Reputation:
This post is about testing the performances of solutions to the following problem:
A cell array of
S
strings formatted as numbers separated by underscores is given, e.g.:
list_of_words = repmat({'02_04_04_52_23_14_54_672_0'},10,1);
Is guaranteed for all strings that:
- they contain only decimal digits and underscores;
- the number of underscores is the same;
- there are no two consecutive underscores.
Write MATLAB code that would convert the cell array of strings to a numeric matrix
S
×M
of doubles (values 1000 time smaller), whereS
is the number of strings andM
is the number of numbers in a string.
A very similar problem was posted on StackOverflow, with several solutions proposed. The original goal was improving speed performance.
Two of the solutions, designed for speed, seem to have wildly varying performances from one MATLAB installation to another, or even from one run to another. Also, they have really different implementation styles.
In the dark corner you'll find a solution that goes against MATLAB's most sacred tenets: eval
is evil, and loops should be avoided at all costs. However, the code tries to optimise the speed by avoiding repeated memory allocations, by using the fast ways of converting strings to numbers, and by doing in-place operations:
%'eval_and_loops_solution.m'
function array_of_numbers = eval_and_loops_solution(list_of_words)
n_numbers = 1 + sum(list_of_words{1}=='_');
n_words = numel(list_of_words);
array_of_numbers = zeros(n_numbers, n_words);
for k = 1:n_words
temp = ['[', list_of_words{k}, ']'];
temp(temp=='_') = ';';
array_of_numbers(:,k) = eval(temp);
end;
%'this is a waste of memory, but is kind of required'
array_of_numbers = transpose(array_of_numbers / 1000);
end
Note: The original solution returned the result as M
×S
double
array. The code was adapted for S
×M
; however, this adaptation consumes twice the memory. And yes, I wrote this solution.
In the clear corner you'll find a solution true to MATLAB spirit, that avoids using loops and eval
, favoring a single sscanf
read of all the strings (which is a clever way of avoiding the overhead of calling the function S
times):
%'single_sscanf_solution.m'
function array_of_numbers = single_sscanf_solution(list_of_words)
N = 1 + sum(list_of_words{1}=='_'); %'hard-coded in the original'
lens = cellfun(@numel,list_of_words);
tlens = sum(lens);
idx(1,tlens)=0;
idx(cumsum(lens(1:end-1))+1)=1;
idx2 = (1:tlens) + cumsum(idx);
one_str(1:max(idx2)+1)='_';
one_str(idx2) = [list_of_words{:}];
delim = repmat('%d_',1,N*numel(lens));
array_of_numbers = reshape(sscanf(one_str, delim),N,[])'./1000;
end
Note: This solution belongs to @Divakar.
The referee is made up of two functions: one that generates test cases, and one that does the timig.
The test case generator groups in a cell array underscore-delimited strings out of 10 random integers between 1 and 9999 (for the sake of simplicity); however, an implementation should only assume that the numbers are positive or zero, and the number should fit in a double
:
%'referee_test_case.m'
function list_of_words = referee_test_case(n_words)
NUM_PER_WORD = 10;
NUM_MAX = 9999;
word_format = [repmat('%d_', 1, NUM_PER_WORD-1), '%d'];
list_of_words = cell(n_words,1);
fprintf('Generating %d-words test case...\n', n_words);
for k = 1:n_words
list_of_words{k} = sprintf(word_format, randi(NUM_MAX, [1, NUM_PER_WORD]));
end;
end
The timing function takes as arguments a test case and an arbitrary number of processing function handles; it is implemented such as errors in one function should not bother the execution of the others. It uses timeit
as recommended by @Divakar and @LuisMendo; for those who don't have this function in their default MATLAB installation, it may be downloaded from MATLAB Central / File Exchange:
%'referee_timing.m'
function referee_timing(test_case, varargin)
%' Specify the test case as a cell array of strings, then the function '
%' handles that will process the test case. '
%' '
%' The function uses timeit() to evaluate the performance of different '
%' processing handles; if your MATLAB installation does not have it '
%' natively, download the available version from File Exchange: '
%' '
%' http://www.mathworks.com/matlabcentral/fileexchange/18798-timeit-benchmarking-function '
fprintf('Timing %d-words test case...\n', numel(test_case));
for k = 1:numel(varargin)
try
t = timeit(@() varargin{k}(test_case));
fprintf(' %s: %f[s]\n', func2str(varargin{k}), t);
catch ME
fprintf(' %s: Error - %s\n', func2str(varargin{k}), ME.message);
end;
end;
end
As said before, the results seem to vary from one MATLAB installation to another, and even from one run to another. The problem that I propose is three-fold:
For point 1. (benchmarking), please create in your MATLAB path the four function files eval_and_loops_solution.m
, single_sscanf_solution.m
, referee_test_case.m
, referee_timig.m
and other functions that you might want to test (for example, the implementations proposed by answers). Use them for several numbers of words, e.g. like this script:
%'test.m'
clc;
feature accel on; %'tune for speed'
%'low memory stress'
referee_timing( ...
referee_test_case(10^4), ...
@eval_and_loops_solution, ...
@single_sscanf_solution ... %'add here other functions'
);
%'medium memory stress'
referee_timing( ...
referee_test_case(10^5), ...
@eval_and_loops_solution, ...
@single_sscanf_solution ... %'add here other functions'
);
%'high memory stress'
referee_timing( ...
referee_test_case(10^6), ...
@eval_and_loops_solution, ...
@single_sscanf_solution ... %'add here other functions'
);
When posting the results, please specify your MATLAB version, the OS, the size of RAM and the CPU model. Please be aware that these tests might take some time for large number of words! However, running this script will not alter the content of your current workspace.
For points 2. or 3. you can post code that use the same in/out conventions as eval_and_loops_solution.m
and single_sscanf_solution.m
, along with the benchmarking that supports the claim.
I will up-vote every benchmarking answer, and I encourage everybody to do that. Is the least that one can do for people that contribute with their skills, time and processing power.
A +500 bounty bonus will go to the author of the fastest code, be it one of the two proposed, or a third new one that outperforms them. I really hope that this will match the general agreement. The bonus will be given in a month week from the original date of posting.
Upvotes: 17
Views: 1551
Reputation: 30579
This builds on Amro's MEX solution. Since the problem as defined by CST-Link is so tightly constrained, a fast implementation can be made by sacrificing robustness and forgoing error handling. Hence, format the input as specified!
Substitute the main loop in Amro's source with the following for ~4x speed up, without multi-threading (full source):
// extract numbers from strings
for (mwIndex idx=0, i=0; idx<n_words; ++idx) {
std::string str = getString(cellstr, idx);
std::string::const_iterator istr = str.cbegin();
for (; istr != str.cend(); ++istr) {
if (*istr < '0' || *istr > '9')
{
++i;
continue;
}
out[i] *= 10;
out[i] += *istr - '0';
}
++i;
}
More manual optimization can be done, but I'm going to bed. Also, I'm planning to the multi-threaded version at some point, but it's pretty straightforward to add the pragma.
Benchmarks (UPDATED T-minus 12 hrs until end of grace period)
Benchmarks with Amro's latest package of functions plus Ben Voigt's solution show different timings on different machines, but for what it's worth it looks like there is a virtual tie between a few solutions. On my i7 laptop with R2014a 64-bit on Windows 8 (on CUDA), my textscan, Divakar's latest and Ben Voigt's solutions and are best, with indistinguishable performance differences. CST-Link's "all loops" is very close but consistently a hair slower than the others. These benchmarks use up to 1e6 strings. The size of the cell array decides the fastest method.
Taking out the MEX solutions (and GPU because my of my laptop), the solutions are placed as follows for various numbers of rows. Again, the results suggest that the appropriate method depends on the number of rows:
Solution 1e6 5e5 1e5 1e4 1e3 100 ____________________________________ ____ ____ ____ ____ ____ ____ 'solution_textscan_sprintf_chappjc' 1 2 2 5 5 4 'solution_bsxfun_sprintf_Divakar' 2 1 3 2 1 8 'func_voigt_loop' 3 3 5 1 2 1 'func_textscan_sprintf' 4 4 4 4 6 3 'solution_bsxfun_bytestream_Divakar' 5 5 6 3 3 10 'solution_loops_CSTLink' 6 7 7 7 8 6 'func_sscanf_sprintf' 7 6 1 6 7 7 'solution_bsxfun_cumsum_Divakar' 8 8 8 8 4 9 'solution_sscanf_char_LuisMendo' 9 9 9 9 9 11 'func_textscan_strjoin' 10 10 15 10 10 16 'func_sscanf_strjoin' 11 11 10 11 11 19 'func_eval_cellfun' 12 12 13 12 12 18 'func_eval_loop' 13 13 11 13 13 12 'solution_eval_loops_CSTLink' 14 15 14 14 14 13 'func_sscanf_loop' 15 14 12 15 15 15 'func_textscan_loop' 16 18 19 18 18 21 'func_sscanf_cellfun' 17 17 16 17 17 22 'func_str2num_cellfun' 18 19 18 19 19 23 'func_str2num_loop' 19 20 20 20 20 24 'solution_sscanf_Divakar' 20 16 17 16 16 20 'func_textscan_cellfun' 21 21 21 21 21 25 'func_str2num_sprintf' 22 23 23 22 22 5 'func_str2num_strjoin' 23 24 25 23 23 17 'func_eval_sprintf' 24 22 24 24 24 2 'func_eval_strjoin' 25 25 22 25 25 14
>> t
t =
func nrows ncols time
____________________________________ _____ _____ __________
'func_eval_cellfun' 100 10 0.00074097
'func_eval_loop' 100 10 0.0006137
'func_eval_sprintf' 100 10 0.00017814
'func_eval_strjoin' 100 10 0.00068062
'func_sscanf_cellfun' 100 10 0.0012905
'func_sscanf_loop' 100 10 0.00069992
'func_sscanf_sprintf' 100 10 0.00022678
'func_sscanf_strjoin' 100 10 0.00075428
'func_str2num_cellfun' 100 10 0.0014366
'func_str2num_loop' 100 10 0.0014904
'func_str2num_sprintf' 100 10 0.00020667
'func_str2num_strjoin' 100 10 0.00073786
'func_textscan_cellfun' 100 10 0.0018517
'func_textscan_loop' 100 10 0.0012629
'func_textscan_sprintf' 100 10 0.00020092
'func_textscan_strjoin' 100 10 0.00071305
'func_voigt_loop' 100 10 0.00017711
'solution_bsxfun_bytestream_Divakar' 100 10 0.00029257
'solution_bsxfun_cumsum_Divakar' 100 10 0.00028395
'solution_bsxfun_sprintf_Divakar' 100 10 0.00023879
'solution_eval_loops_CSTLink' 100 10 0.00066461
'solution_loops_CSTLink' 100 10 0.00021923
'solution_mex_Amro' 100 10 0.00020502
'solution_mex_chappjc' 100 10 6.3164e-05
'solution_mex_omp_Amro' 100 10 0.00018224
'solution_mex_omp_chappjc' 100 10 8.2565e-05
'solution_sscanf_Divakar' 100 10 0.00084008
'solution_sscanf_char_LuisMendo' 100 10 0.00033844
'solution_textscan_sprintf_chappjc' 100 10 0.00020396
'func_eval_cellfun' 1000 10 0.0058036
'func_eval_loop' 1000 10 0.0060269
'func_eval_sprintf' 1000 10 0.055797
'func_eval_strjoin' 1000 10 0.057631
'func_sscanf_cellfun' 1000 10 0.011712
'func_sscanf_loop' 1000 10 0.0067405
'func_sscanf_sprintf' 1000 10 0.0019112
'func_sscanf_strjoin' 1000 10 0.0040608
'func_str2num_cellfun' 1000 10 0.013712
'func_str2num_loop' 1000 10 0.014961
'func_str2num_sprintf' 1000 10 0.04916
'func_str2num_strjoin' 1000 10 0.051823
'func_textscan_cellfun' 1000 10 0.017256
'func_textscan_loop' 1000 10 0.012454
'func_textscan_sprintf' 1000 10 0.0016489
'func_textscan_strjoin' 1000 10 0.0038387
'func_voigt_loop' 1000 10 0.0012892
'solution_bsxfun_bytestream_Divakar' 1000 10 0.0013951
'solution_bsxfun_cumsum_Divakar' 1000 10 0.0015138
'solution_bsxfun_sprintf_Divakar' 1000 10 0.0011496
'solution_eval_loops_CSTLink' 1000 10 0.0061538
'solution_loops_CSTLink' 1000 10 0.0020528
'solution_mex_Amro' 1000 10 0.0019629
'solution_mex_chappjc' 1000 10 0.00051825
'solution_mex_omp_Amro' 1000 10 0.00085117
'solution_mex_omp_chappjc' 1000 10 0.00025825
'solution_sscanf_Divakar' 1000 10 0.0078551
'solution_sscanf_char_LuisMendo' 1000 10 0.0031104
'solution_textscan_sprintf_chappjc' 1000 10 0.0016144
'func_eval_cellfun' 10000 10 0.05772
'func_eval_loop' 10000 10 0.061705
'func_eval_sprintf' 10000 10 0.54464
'func_eval_strjoin' 10000 10 0.57007
'func_sscanf_cellfun' 10000 10 0.1192
'func_sscanf_loop' 10000 10 0.068017
'func_sscanf_sprintf' 10000 10 0.019622
'func_sscanf_strjoin' 10000 10 0.038232
'func_str2num_cellfun' 10000 10 0.13811
'func_str2num_loop' 10000 10 0.14812
'func_str2num_sprintf' 10000 10 0.48726
'func_str2num_strjoin' 10000 10 0.50528
'func_textscan_cellfun' 10000 10 0.17378
'func_textscan_loop' 10000 10 0.1241
'func_textscan_sprintf' 10000 10 0.016595
'func_textscan_strjoin' 10000 10 0.035599
'func_voigt_loop' 10000 10 0.012136
'solution_bsxfun_bytestream_Divakar' 10000 10 0.015908
'solution_bsxfun_cumsum_Divakar' 10000 10 0.02301
'solution_bsxfun_sprintf_Divakar' 10000 10 0.014862
'solution_eval_loops_CSTLink' 10000 10 0.063188
'solution_loops_CSTLink' 10000 10 0.020153
'solution_mex_Amro' 10000 10 0.019252
'solution_mex_chappjc' 10000 10 0.0051221
'solution_mex_omp_Amro' 10000 10 0.0066551
'solution_mex_omp_chappjc' 10000 10 0.0014584
'solution_sscanf_Divakar' 10000 10 0.096345
'solution_sscanf_char_LuisMendo' 10000 10 0.031047
'solution_textscan_sprintf_chappjc' 10000 10 0.016736
'func_eval_cellfun' 1e+05 10 0.78876
'func_eval_loop' 1e+05 10 0.6119
'func_eval_sprintf' 1e+05 10 6.7603
'func_eval_strjoin' 1e+05 10 5.7204
'func_sscanf_cellfun' 1e+05 10 1.2096
'func_sscanf_loop' 1e+05 10 0.68303
'func_sscanf_sprintf' 1e+05 10 0.21101
'func_sscanf_strjoin' 1e+05 10 0.55226
'func_str2num_cellfun' 1e+05 10 1.411
'func_str2num_loop' 1e+05 10 1.8229
'func_str2num_sprintf' 1e+05 10 6.1474
'func_str2num_strjoin' 1e+05 10 7.551
'func_textscan_cellfun' 1e+05 10 2.5898
'func_textscan_loop' 1e+05 10 1.7934
'func_textscan_sprintf' 1e+05 10 0.25421
'func_textscan_strjoin' 1e+05 10 1.1762
'func_voigt_loop' 1e+05 10 0.25602
'solution_bsxfun_bytestream_Divakar' 1e+05 10 0.265
'solution_bsxfun_cumsum_Divakar' 1e+05 10 0.35656
'solution_bsxfun_sprintf_Divakar' 1e+05 10 0.23481
'solution_eval_loops_CSTLink' 1e+05 10 0.86425
'solution_loops_CSTLink' 1e+05 10 0.28436
'solution_mex_Amro' 1e+05 10 0.27104
'solution_mex_chappjc' 1e+05 10 0.078901
'solution_mex_omp_Amro' 1e+05 10 0.096553
'solution_mex_omp_chappjc' 1e+05 10 0.03679
'solution_sscanf_Divakar' 1e+05 10 1.3818
'solution_sscanf_char_LuisMendo' 1e+05 10 0.43994
'solution_textscan_sprintf_chappjc' 1e+05 10 0.21271
'func_eval_cellfun' 5e+05 10 3.7658
'func_eval_loop' 5e+05 10 3.8106
'func_eval_sprintf' 5e+05 10 32.383
'func_eval_strjoin' 5e+05 10 40.451
'func_sscanf_cellfun' 5e+05 10 8.5704
'func_sscanf_loop' 5e+05 10 4.707
'func_sscanf_sprintf' 5e+05 10 1.4362
'func_sscanf_strjoin' 5e+05 10 2.8301
'func_str2num_cellfun' 5e+05 10 9.6439
'func_str2num_loop' 5e+05 10 10.453
'func_str2num_sprintf' 5e+05 10 35.818
'func_str2num_strjoin' 5e+05 10 37.277
'func_textscan_cellfun' 5e+05 10 12.418
'func_textscan_loop' 5e+05 10 8.8237
'func_textscan_sprintf' 5e+05 10 1.2793
'func_textscan_strjoin' 5e+05 10 2.6496
'func_voigt_loop' 5e+05 10 1.2486
'solution_bsxfun_bytestream_Divakar' 5e+05 10 1.324
'solution_bsxfun_cumsum_Divakar' 5e+05 10 1.8229
'solution_bsxfun_sprintf_Divakar' 5e+05 10 1.2113
'solution_eval_loops_CSTLink' 5e+05 10 6.5759
'solution_loops_CSTLink' 5e+05 10 1.4583
'solution_mex_Amro' 5e+05 10 1.3718
'solution_mex_chappjc' 5e+05 10 0.39711
'solution_mex_omp_Amro' 5e+05 10 0.48046
'solution_mex_omp_chappjc' 5e+05 10 0.48174
'solution_sscanf_Divakar' 5e+05 10 7.7943
'solution_sscanf_char_LuisMendo' 5e+05 10 2.2332
'solution_textscan_sprintf_chappjc' 5e+05 10 1.2399
'func_eval_cellfun' 1e+06 10 7.3884
'func_eval_loop' 1e+06 10 7.5519
'func_eval_sprintf' 1e+06 10 69.868
'func_eval_strjoin' 1e+06 10 71.964
'func_sscanf_cellfun' 1e+06 10 15.061
'func_sscanf_loop' 1e+06 10 8.4163
'func_sscanf_sprintf' 1e+06 10 2.7099
'func_sscanf_strjoin' 1e+06 10 5.1453
'func_str2num_cellfun' 1e+06 10 17.42
'func_str2num_loop' 1e+06 10 18.165
'func_str2num_sprintf' 1e+06 10 60.902
'func_str2num_strjoin' 1e+06 10 63.579
'func_textscan_cellfun' 1e+06 10 20.423
'func_textscan_loop' 1e+06 10 14.309
'func_textscan_sprintf' 1e+06 10 2.2853
'func_textscan_strjoin' 1e+06 10 4.5216
'func_voigt_loop' 1e+06 10 2.2443
'solution_bsxfun_bytestream_Divakar' 1e+06 10 2.3495
'solution_bsxfun_cumsum_Divakar' 1e+06 10 3.3843
'solution_bsxfun_sprintf_Divakar' 1e+06 10 2.0311
'solution_eval_loops_CSTLink' 1e+06 10 7.7524
'solution_loops_CSTLink' 1e+06 10 2.4947
'solution_mex_Amro' 1e+06 10 2.486
'solution_mex_chappjc' 1e+06 10 0.76551
'solution_mex_omp_Amro' 1e+06 10 0.92226
'solution_mex_omp_chappjc' 1e+06 10 0.88736
'solution_sscanf_Divakar' 1e+06 10 19.673
'solution_sscanf_char_LuisMendo' 1e+06 10 3.8578
'solution_textscan_sprintf_chappjc' 1e+06 10 2.0074
Next...
The order is different, but the differences are again insignificant.
Timings exceed 30000 character limit, but I can post them somewhere if desired. The order is clear though.
I suggest CST-Link disregard all these measurements in deciding.
Of course, the MEX solutions rule. The solution above with std::string::const_iterator istr
is fast, even faster with OpenMP. GCC 4.9.1 (pthreads) and VS2013 have comparable performance for this code. Threaded source here.
Upvotes: 6
Reputation: 283614
Here, in the MATLAB holiday spirit, is a highly-vectorized solution:
function M = func_voigt_loop(C)
S = sprintf('_%s', C{:});
digitpos = [find(S(2:end) == '_'), numel(S)];
M = zeros(size(digitpos));
place = 1;
while 1
digits = S(digitpos);
mask = double(digits ~= '_');
M = M + mask.*(digits - '0') * place;
place = place * 10;
digitpos = digitpos - mask;
if ~any(mask); break; end
end
M = reshape(M, [], numel(C))';
end
It's only a few lines, easy to understand (grab the ones digit of every number, add in the tens digit, etc), and doesn't use any "exotic" functions like eval
or textscan
. It's also pretty fast.
This is an idea similar to something I previously developed for doing datenum
on an entire column of data in a CSV file, the MATLAB version, even with an particular pattern specified, was just horribly slow. A later release of MATLAB improved datenum significantly, but my hand-tuned version still beats it handily.
Upvotes: 2
Reputation: 124543
Here is a bunch of implementations for you to benchmark (all pure MATLAB). Some of them borrow ideas from the other solutions.
function M = func_eval_strjoin(C)
M = eval(['[' strjoin(strrep(C,'_',' ').',';') ']']);
end
function M = func_eval_sprintf(C)
C = strrep(C,'_',' ');
M = eval(['[' sprintf('%s;',C{:}) ']']);
end
function M = func_eval_cellfun(C)
M = cell2mat(cellfun(@eval, strcat('[',strrep(C,'_',' '),']'), 'Uniform',false));
end
function M = func_eval_loop(C)
M = zeros(numel(C),nnz(C{1}=='_')+1);
for i=1:numel(C)
M(i,:) = eval(['[' strrep(C{i},'_',' ') ']']);
end
end
function M = func_str2num_strjoin(C)
M = reshape(str2num(strjoin(strrep(C.','_',' '))), [], numel(C)).';
end
function M = func_str2num_sprintf(C)
C = strrep(C,'_',' ');
M = reshape(str2num(sprintf('%s ',C{:})), [], numel(C)).';
end
function M = func_str2num_cellfun(C)
M = cell2mat(cellfun(@str2num, strrep(C, '_', ' '), 'Uniform',false));
end
function M = func_str2num_loop(C)
M = zeros(numel(C),nnz(C{1}=='_')+1);
for i=1:numel(C)
M(i,:) = str2num(strrep(C{i}, '_', ' '));
end
end
function M = func_sscanf_strjoin(C)
M = reshape(sscanf(strjoin(C', '_'), '%f_'), [], numel(C)).';
end
function M = func_sscanf_sprintf(C)
M = reshape(sscanf(sprintf('%s_',C{:}),'%f_'), [], numel(C)).';
end
function M = func_sscanf_cellfun(C)
M = cell2mat(cellfun(@(c) sscanf(c, '%f_'), C, 'Uniform',false).').';
end
function M = func_sscanf_loop(C)
M = zeros(numel(C),nnz(C{1}=='_')+1);
for i=1:numel(C)
M(i,:) = sscanf(C{i}, '%f_');
end
end
function M = func_textscan_strjoin(C)
M = textscan(strjoin(C', '_'), '%.0f', 'Delimiter','_');
M = reshape(M{1}, [], numel(C)).';
end
function M = func_textscan_sprintf(C)
M = textscan(sprintf('%s_',C{:}),'%.0f', 'Delimiter','_');
M = reshape(M{1}, [], numel(C)).';
end
function M = func_textscan_cellfun(C)
M = cell2mat(cellfun(@(str) textscan(str, '%.0f', 'Delimiter','_'), C).').';
end
function M = func_textscan_loop(C)
M = zeros(numel(C),nnz(C{1}=='_')+1);
for i=1:numel(C)
x = textscan(C{i}, '%.0f', 'Delimiter','_');
M(i,:) = x{1};
end
end
function M = func_str2double_strsplit_strjoin(C)
M = reshape(str2double(strsplit(strjoin(C','_'),'_')), [], numel(C)).';
end
function M = func_str2double_strsplit_sprintf(C)
M = strsplit(sprintf('%s_',C{:}), '_');
M = reshape(str2double(M(1:end-1)), [], numel(C)).';
end
function M = func_str2double_strsplit(C)
M = cellfun(@(c) strsplit(c,'_'), C, 'Uniform',false);
M = str2double(cat(1,M{:}));
end
function M = func_str2double_strsplit_cellfun(C)
M = cell2mat(cellfun(@(c) str2double(strsplit(c,'_')), C, 'Uniform',false));
end
function M = func_str2double_strsplit_loop(C)
M = zeros(numel(C),nnz(C{1}=='_')+1);
for i=1:numel(C)
M(i,:) = str2double(strsplit(C{i},'_'));
end
end
function M = func_str2double_regex_split_strjoin(C)
M = reshape(str2double(regexp(strjoin(C.','_'), '_', 'split')), [], numel(C)).';
end
function M = func_str2double_regex_split_sprintf(C)
M = regexp(sprintf('%s_',C{:}), '_', 'split');
M = reshape(str2double(M(1:end-1)), [], numel(C)).';
end
function M = func_str2double_regex_split(C)
M = regexp(C, '_', 'split');
M = reshape(str2double([M{:}]), [], numel(C)).';
end
function M = func_str2double_regex_split_cellfun(C)
M = cell2mat(cellfun(@str2double, regexp(C, '_', 'split'), 'Uniform',false));
end
function M = func_str2double_regex_split_loop(C)
M = zeros(numel(C),nnz(C{1}=='_')+1);
for i=1:numel(C)
M(i,:) = str2double(regexp(C{i}, '_', 'split'));
end
end
function M = func_str2double_regex_tokens_strjoin_1(C)
M = reshape(cellfun(@str2double, regexp(strjoin(C.','_'), '(\d+)', 'tokens')), [], numel(C)).';
end
function M = func_str2double_regex_tokens_strjoin_2(C)
M = regexp(strjoin(C.','_'), '(\d+)', 'tokens');
M = reshape(str2double([M{:}]), [], numel(C)).';
end
function M = func_str2double_regex_tokens_sprintf_1(C)
M = reshape(cellfun(@str2double, regexp(sprintf('%s_',C{:}), '(\d+)', 'tokens')), [], numel(C)).';
end
function M = func_str2double_regex_tokens_sprintf_2(C)
M = regexp(sprintf('%s_',C{:}), '(\d+)', 'tokens');
M = reshape(str2double([M{:}]), [], numel(C)).';
end
function M = func_str2double_regex_tokens(C)
M = regexp(C, '(\d+)', 'tokens');
M = cat(1,M{:});
M = reshape(str2double([M{:}]), size(M));
end
function M = func_str2double_regex_tokens_cellfun(C)
M = regexp(C, '(\d+)', 'tokens');
M = cellfun(@str2double, cat(1,M{:}));
end
function M = func_str2double_regex_tokens_loop(C)
M = zeros(numel(C),nnz(C{1}=='_')+1);
for i=1:numel(C)
x = regexp(C{i}, '(\d+)', 'tokens');
M(i,:) = str2double([x{:}]);
end
end
Most methods are variations of the same idea, only with slightly different implementations (e.g: using explicit for-loop vs. cellfun
, using strjoin
vs. sprintf
to join a cell array of strings into one string, etc..).
Let me break it down a bit more:
there's the eval
-based solutions. We either apply eval
in a loop, or we make a single call after flattening the strings into one.
similarly there are solutions based on calling str2num
(after replacing underscores with spaces). It's worth noting that str2num
itself calls eval
internally.
there's also the sscanf
and textscan
solutions. Like before, we either use them in a loop, or called on one long string.
another set of solutions are based on calling str2double
after splitting the strings by the underscore delimiter. Again there alternative ways to split the strings (using regex
with the "split" option, or using the strsplit
function).
finally there a set of solutions based on regular expression matching.
I created a set of functions for benchmarking the various implementations (GitHub Gist). Here are the pre-packaged files with all solutions posted so far. I included the compiled MEX-functions (for 64-bit Windows), along with MinGW-w64 DLL dependencies.
Here are the results of running on my machine (Laptop with quad-core Intel Core i7 CPU, 8GB, Win8.1, MATLAB R2014a). I tested the following sizes: 10x10, 100x10, 1000x10, and 10000x10. As you can see, the MEX solutions do better as you increase the data size...
As requested, I updated my test results with the latest solutions; I've added the modified versions of the MEX-file by @chappjc, and the GPU versions by @Divakar. Here is the updated archive of files.
I'm using the same machine as before. I've increased the cell-array size up to 1e5. My GPU device is decent for a laptop:
>> gpuDevice
ans =
CUDADevice with properties:
Name: 'GeForce GT 630M'
Index: 1
ComputeCapability: '2.1'
SupportsDouble: 1
DriverVersion: 5.5000
ToolkitVersion: 5.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [65535 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
FreeMemory: 2.0453e+09
MultiprocessorCount: 2
ClockRateKHz: 950000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Here are the latest results:
>> t
t =
func nrows ncols time
________________________________________ _____ _____ __________
'func_eval_cellfun' 10 10 0.00044645
'func_eval_loop' 10 10 0.0001554
'func_eval_sprintf' 10 10 7.6547e-05
'func_eval_strjoin' 10 10 0.00056739
'func_sscanf_cellfun' 10 10 0.00037247
'func_sscanf_loop' 10 10 0.00017182
'func_sscanf_sprintf' 10 10 8.4928e-05
'func_sscanf_strjoin' 10 10 0.00056388
'func_str2num_cellfun' 10 10 0.00039231
'func_str2num_loop' 10 10 0.00033852
'func_str2num_sprintf' 10 10 0.00010862
'func_str2num_strjoin' 10 10 0.00057953
'func_textscan_cellfun' 10 10 0.00044585
'func_textscan_loop' 10 10 0.00024666
'func_textscan_sprintf' 10 10 9.4507e-05
'func_textscan_strjoin' 10 10 0.00056123
'solution_bsxfun_bytestream_Divakar' 10 10 0.00018166
'solution_bsxfun_bytestream_gpu_Divakar' 10 10 0.0029487
'solution_bsxfun_cumsum_Divakar' 10 10 0.00016396
'solution_bsxfun_sprintf_Divakar' 10 10 0.00012932
'solution_bsxfun_sprintf_gpu_Divakar' 10 10 0.002315
'solution_eval_loops_CSTLink' 10 10 0.00017191
'solution_loops_CSTLink' 10 10 6.5514e-05
'solution_mex_Amro' 10 10 6.4487e-05
'solution_mex_chappjc' 10 10 4.2507e-05
'solution_mex_omp_Amro' 10 10 0.00027411
'solution_mex_omp_chappjc' 10 10 0.00013017
'solution_sscanf_Divakar' 10 10 0.00020458
'solution_sscanf_char_LuisMendo' 10 10 0.00011144
'solution_textscan_sprintf_chappjc' 10 10 0.00010528
'func_eval_cellfun' 100 10 0.0011801
'func_eval_loop' 100 10 0.001059
'func_eval_sprintf' 100 10 0.00025547
'func_eval_strjoin' 100 10 0.0011824
'func_sscanf_cellfun' 100 10 0.0023356
'func_sscanf_loop' 100 10 0.0012338
'func_sscanf_sprintf' 100 10 0.00031012
'func_sscanf_strjoin' 100 10 0.0011334
'func_str2num_cellfun' 100 10 0.002635
'func_str2num_loop' 100 10 0.0028056
'func_str2num_sprintf' 100 10 0.00027899
'func_str2num_strjoin' 100 10 0.0012117
'func_textscan_cellfun' 100 10 0.0029546
'func_textscan_loop' 100 10 0.0018652
'func_textscan_sprintf' 100 10 0.00028506
'func_textscan_strjoin' 100 10 0.001125
'solution_bsxfun_bytestream_Divakar' 100 10 0.00040027
'solution_bsxfun_bytestream_gpu_Divakar' 100 10 0.0032536
'solution_bsxfun_cumsum_Divakar' 100 10 0.00041019
'solution_bsxfun_sprintf_Divakar' 100 10 0.00031089
'solution_bsxfun_sprintf_gpu_Divakar' 100 10 0.0026271
'solution_eval_loops_CSTLink' 100 10 0.0012294
'solution_loops_CSTLink' 100 10 0.00033501
'solution_mex_Amro' 100 10 0.00027069
'solution_mex_chappjc' 100 10 0.00010682
'solution_mex_omp_Amro' 100 10 0.00039385
'solution_mex_omp_chappjc' 100 10 0.00015232
'solution_sscanf_Divakar' 100 10 0.0010108
'solution_sscanf_char_LuisMendo' 100 10 0.00050153
'solution_textscan_sprintf_chappjc' 100 10 0.00026958
'func_eval_cellfun' 1000 10 0.0092491
'func_eval_loop' 1000 10 0.016145
'func_eval_sprintf' 1000 10 0.067573
'func_eval_strjoin' 1000 10 0.070024
'func_sscanf_cellfun' 1000 10 0.020954
'func_sscanf_loop' 1000 10 0.011224
'func_sscanf_sprintf' 1000 10 0.0022546
'func_sscanf_strjoin' 1000 10 0.0058568
'func_str2num_cellfun' 1000 10 0.024699
'func_str2num_loop' 1000 10 0.02645
'func_str2num_sprintf' 1000 10 0.05713
'func_str2num_strjoin' 1000 10 0.060093
'func_textscan_cellfun' 1000 10 0.02592
'func_textscan_loop' 1000 10 0.017589
'func_textscan_sprintf' 1000 10 0.0020249
'func_textscan_strjoin' 1000 10 0.0055364
'solution_bsxfun_bytestream_Divakar' 1000 10 0.0018817
'solution_bsxfun_bytestream_gpu_Divakar' 1000 10 0.0066003
'solution_bsxfun_cumsum_Divakar' 1000 10 0.001982
'solution_bsxfun_sprintf_Divakar' 1000 10 0.0015578
'solution_bsxfun_sprintf_gpu_Divakar' 1000 10 0.0046952
'solution_eval_loops_CSTLink' 1000 10 0.011481
'solution_loops_CSTLink' 1000 10 0.0027254
'solution_mex_Amro' 1000 10 0.0022698
'solution_mex_chappjc' 1000 10 0.0006967
'solution_mex_omp_Amro' 1000 10 0.0015025
'solution_mex_omp_chappjc' 1000 10 0.00041463
'solution_sscanf_Divakar' 1000 10 0.0093785
'solution_sscanf_char_LuisMendo' 1000 10 0.0038031
'solution_textscan_sprintf_chappjc' 1000 10 0.0020323
'func_eval_cellfun' 10000 10 0.083676
'func_eval_loop' 10000 10 0.098798
'func_eval_sprintf' 10000 10 0.60429
'func_eval_strjoin' 10000 10 0.63656
'func_sscanf_cellfun' 10000 10 0.20675
'func_sscanf_loop' 10000 10 0.1088
'func_sscanf_sprintf' 10000 10 0.021725
'func_sscanf_strjoin' 10000 10 0.052341
'func_str2num_cellfun' 10000 10 0.24192
'func_str2num_loop' 10000 10 0.26538
'func_str2num_sprintf' 10000 10 0.53451
'func_str2num_strjoin' 10000 10 0.56759
'func_textscan_cellfun' 10000 10 0.25474
'func_textscan_loop' 10000 10 0.17402
'func_textscan_sprintf' 10000 10 0.018799
'func_textscan_strjoin' 10000 10 0.04965
'solution_bsxfun_bytestream_Divakar' 10000 10 0.019165
'solution_bsxfun_bytestream_gpu_Divakar' 10000 10 0.031283
'solution_bsxfun_cumsum_Divakar' 10000 10 0.027986
'solution_bsxfun_sprintf_Divakar' 10000 10 0.017761
'solution_bsxfun_sprintf_gpu_Divakar' 10000 10 0.024821
'solution_eval_loops_CSTLink' 10000 10 0.10885
'solution_loops_CSTLink' 10000 10 0.025136
'solution_mex_Amro' 10000 10 0.021374
'solution_mex_chappjc' 10000 10 0.0060774
'solution_mex_omp_Amro' 10000 10 0.0076461
'solution_mex_omp_chappjc' 10000 10 0.002058
'solution_sscanf_Divakar' 10000 10 0.10503
'solution_sscanf_char_LuisMendo' 10000 10 0.035483
'solution_textscan_sprintf_chappjc' 10000 10 0.018772
'func_eval_cellfun' 1e+05 10 0.85115
'func_eval_loop' 1e+05 10 0.97977
'func_eval_sprintf' 1e+05 10 6.2422
'func_eval_strjoin' 1e+05 10 6.5012
'func_sscanf_cellfun' 1e+05 10 2.0761
'func_sscanf_loop' 1e+05 10 1.0865
'func_sscanf_sprintf' 1e+05 10 0.22618
'func_sscanf_strjoin' 1e+05 10 0.53146
'func_str2num_cellfun' 1e+05 10 2.4041
'func_str2num_loop' 1e+05 10 2.6431
'func_str2num_sprintf' 1e+05 10 5.4
'func_str2num_strjoin' 1e+05 10 5.6967
'func_textscan_cellfun' 1e+05 10 2.5696
'func_textscan_loop' 1e+05 10 1.7175
'func_textscan_sprintf' 1e+05 10 0.19759
'func_textscan_strjoin' 1e+05 10 0.50314
'solution_bsxfun_bytestream_Divakar' 1e+05 10 0.21884
'solution_bsxfun_bytestream_gpu_Divakar' 1e+05 10 0.23607
'solution_bsxfun_cumsum_Divakar' 1e+05 10 0.29511
'solution_bsxfun_sprintf_Divakar' 1e+05 10 0.19882
'solution_bsxfun_sprintf_gpu_Divakar' 1e+05 10 0.17923
'solution_eval_loops_CSTLink' 1e+05 10 1.0943
'solution_loops_CSTLink' 1e+05 10 0.2534
'solution_mex_Amro' 1e+05 10 0.21575
'solution_mex_chappjc' 1e+05 10 0.060666
'solution_mex_omp_Amro' 1e+05 10 0.072168
'solution_mex_omp_chappjc' 1e+05 10 0.024385
'solution_sscanf_Divakar' 1e+05 10 1.0992
'solution_sscanf_char_LuisMendo' 1e+05 10 0.36688
'solution_textscan_sprintf_chappjc' 1e+05 10 0.19755
Upvotes: 7
Reputation: 221504
Few things must be noted on the benchmark codes used for this benchmarking. This is same as Amro's benchmark codes with these changes -
1000
was added to all solutions to keep the consistency among solutions.1000000
and for that I had to keep the count of numbers in each cell as 8
to accommodate everything into memory. Also, I think this would present a fair comparison between vectorized and loopy solutions.gputimeit
was used, instead of timeit
.The codes are packaged and available here. In it, use bench_script.m
as the main function.
Benchmark Results
func nrows ncols time
__________________________ _____ _____ ________
'all_loops' 10000 8 0.026996
'char_and_sscanf_solution' 10000 8 0.034492
'eval_and_loops_solution' 10000 8 0.22548
'extIntsSprintfTextScan' 10000 8 0.036889
'func_eval_cellfun' 10000 8 0.16483
'func_eval_loop' 10000 8 0.1845
'func_sscanf_cellfun' 10000 8 0.40217
'func_sscanf_loop' 10000 8 0.19508
'func_sscanf_sprintf' 10000 8 0.027505
'func_sscanf_strjoin' 10000 8 0.11128
'func_str2num_cellfun' 10000 8 0.4976
'func_str2num_loop' 10000 8 0.5303
'func_textscan_cellfun' 10000 8 0.547
'func_textscan_loop' 10000 8 0.3752
'func_textscan_sprintf' 10000 8 0.036699
'func_textscan_strjoin' 10000 8 0.12059
'single_sscanf_solution' 10000 8 0.17122
'soln_bytstrm_bsxfun_gpu' 10000 8 0.023365
'soln_sprintf_bsxfun_gpu' 10000 8 0.019986
'solution_bytstrm_bsxfun' 10000 8 0.031165
'solution_cumsum_bsxfun' 10000 8 0.047445
'solution_sprintf_bsxfun' 10000 8 0.028417
'all_loops' 50000 8 0.13444
'char_and_sscanf_solution' 50000 8 0.1753
'eval_and_loops_solution' 50000 8 1.1242
'extIntsSprintfTextScan' 50000 8 0.1871
'func_eval_cellfun' 50000 8 0.82261
'func_eval_loop' 50000 8 0.91632
'func_sscanf_cellfun' 50000 8 2.0088
'func_sscanf_loop' 50000 8 0.97656
'func_sscanf_sprintf' 50000 8 0.13891
'func_sscanf_strjoin' 50000 8 0.56368
'func_str2num_cellfun' 50000 8 2.4786
'func_str2num_loop' 50000 8 2.6377
'func_textscan_cellfun' 50000 8 2.7452
'func_textscan_loop' 50000 8 1.8249
'func_textscan_sprintf' 50000 8 0.18556
'func_textscan_strjoin' 50000 8 0.60935
'single_sscanf_solution' 50000 8 0.90871
'soln_bytstrm_bsxfun_gpu' 50000 8 0.10591
'soln_sprintf_bsxfun_gpu' 50000 8 0.079611
'solution_bytstrm_bsxfun' 50000 8 0.18875
'solution_cumsum_bsxfun' 50000 8 0.27233
'solution_sprintf_bsxfun' 50000 8 0.16467
'all_loops' 80000 8 0.21602
'char_and_sscanf_solution' 80000 8 0.27855
'eval_and_loops_solution' 80000 8 1.7997
'extIntsSprintfTextScan' 80000 8 0.29733
'func_eval_cellfun' 80000 8 1.3171
'func_eval_loop' 80000 8 1.4647
'func_sscanf_cellfun' 80000 8 3.2232
'func_sscanf_loop' 80000 8 1.5664
'func_sscanf_sprintf' 80000 8 0.22136
'func_sscanf_strjoin' 80000 8 0.89605
'func_str2num_cellfun' 80000 8 3.9688
'func_str2num_loop' 80000 8 4.2199
'func_textscan_cellfun' 80000 8 4.3841
'func_textscan_loop' 80000 8 2.9181
'func_textscan_sprintf' 80000 8 0.29494
'func_textscan_strjoin' 80000 8 0.97383
'single_sscanf_solution' 80000 8 1.4542
'soln_bytstrm_bsxfun_gpu' 80000 8 0.15528
'soln_sprintf_bsxfun_gpu' 80000 8 0.11911
'solution_bytstrm_bsxfun' 80000 8 0.28552
'solution_cumsum_bsxfun' 80000 8 0.43238
'solution_sprintf_bsxfun' 80000 8 0.24801
'all_loops' 1e+05 8 0.26833
'char_and_sscanf_solution' 1e+05 8 0.34617
'eval_and_loops_solution' 1e+05 8 2.2465
'extIntsSprintfTextScan' 1e+05 8 0.37322
'func_eval_cellfun' 1e+05 8 1.641
'func_eval_loop' 1e+05 8 1.8339
'func_sscanf_cellfun' 1e+05 8 4.024
'func_sscanf_loop' 1e+05 8 1.9598
'func_sscanf_sprintf' 1e+05 8 0.27558
'func_sscanf_strjoin' 1e+05 8 1.1193
'func_str2num_cellfun' 1e+05 8 4.9592
'func_str2num_loop' 1e+05 8 5.2634
'func_textscan_cellfun' 1e+05 8 5.6636
'func_textscan_loop' 1e+05 8 3.981
'func_textscan_sprintf' 1e+05 8 0.36947
'func_textscan_strjoin' 1e+05 8 1.208
'single_sscanf_solution' 1e+05 8 1.8665
'soln_bytstrm_bsxfun_gpu' 1e+05 8 0.19389
'soln_sprintf_bsxfun_gpu' 1e+05 8 0.14588
'solution_bytstrm_bsxfun' 1e+05 8 0.35404
'solution_cumsum_bsxfun' 1e+05 8 0.54906
'solution_sprintf_bsxfun' 1e+05 8 0.30324
MATLAB Version: 8.3.0.532 (R2014a)
Operating System: Ubuntu 14.04 LTS 64-bit
RAM: 4GB
CPU Model: IntelĀ® PentiumĀ® Processor E5400 (2M Cache, 2.70 GHz)
GPU Model: GTX 750Ti 2GB
configinfo.m
output (important ones only) -
MATLAB accelerator enabled
MATLAB JIT: enabled
MATLAB assertions: disabled
MATLAB Desktop: enabled
Java JVM: enabled
CPU: x86 Family 6 Model 23 Stepping 10, GenuineIntel
Number of processors: 2
CPU speed is: 2700 MHz
RAM: 4046992 kB
Swap space: 9760764 kB
Number of cores: 2
Number of threads: 2
Upvotes: 2
Reputation: 221504
Approach #1
This first approach in a broad sense has two parts. The first one gets us one big long string that has all the characters from the cell array with underscores separating
two cells, which is essentially implemented with cumsum
. The second one based on bsxfun
gets us the desired numeric output in the desired format.
The code would look something like this -
function out = solution_cumsum_bsxfun(list_of_words)
N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
tlens = sum(lens); %// Total number of characters [total of lengths]
cumlens = cumsum(lens); %// Cumulative lengths
%// Create an array of ones and zeros.
%// The length of this array would be equal to sum of characters in all cells.
%// The ones would be at the positions where new cells start, zeros otherwise
startpos_newcell(1,tlens)=0;
startpos_newcell(cumlens(1:end-1)+1)=1;
%// Calculate new positions for all characters in the cell array with
%// places/positions left between two cells for putting underscores
newpos = (1:tlens) + cumsum(startpos_newcell);
%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells
one_str(newpos) = [list_of_words{:}];
one_str(cumlens + [1:numel(cumlens)]') = '_'; %//'#
pos_us = find(one_str=='_'); %// positions of underscores
wordend_idx = pos_us - [1:numel(pos_us)]; %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length
%// Create mask where digit characters are to be placed
mask = bsxfun(@ge,[1:max_wordlens]',[max_wordlens+1-wordlens]); %//'#
%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1(max_wordlens,size(mask,2)) = 0;
num1(mask) = one_str(one_str~='_') - 48;
%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
out = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#
return;
Approach #2
This a slight variation of approach #1 in that it does the job of first part with sprintf
. Now, the idea came to me after seeing it in action in
@chappjc's solution. I appreciate him on letting me use it in this modified version.
Here's the code -
function out = solution_sprintf_bsxfun(list_of_words)
N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells
one_str = sprintf('%s_',list_of_words{:});
pos_us = find(one_str=='_'); %// positions of underscores
wordend_idx = pos_us - [1:numel(pos_us)]; %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length
%// Create mask where digit characters are to be placed
mask = bsxfun(@ge,[1:max_wordlens]',[max_wordlens+1-wordlens]); %//'#
%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1(max_wordlens,size(mask,2)) = 0;
num1(mask) = one_str(one_str~='_') - 48;
%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
out = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#
return;
Approach #3
This is an altogether new approach based on getByteStreamFromArray
-
function out = solution_bytstrm_bsxfun(list_of_words)
allchars = [list_of_words{:}];
digits_array = getByteStreamFromArray(allchars);
%// At my 32-bit system getByteStreamFromArray gets the significant digits
%// from index 65 onwards. Thus, we need to crop/index it accordingly
digits_array = digits_array(65:65+numel(allchars)-1) - 48;
% --------------------------------------------------------------------
N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
cumlens = cumsum(lens)'; %//'# Cumulative lengths
% ----------------------------------------------------------
pos_us = find(digits_array==47);
starts = [[1 cumlens(1:end-1)+1];reshape(pos_us+1,N-1,[])];
ends = [reshape(pos_us-1,N-1,[]) ; cumlens];
gaps = ends(:) - starts(:) + 1;
maxg = max(gaps);
mask1 = bsxfun(@lt,maxg-gaps',[1:maxg]');
num_array(maxg,numel(lens)*N)=0;
num_array(mask1) = digits_array(digits_array~=47);
out = reshape(10.^(maxg-4:-1:-3)*num_array,N,[]).';
return;
GPU versions of the above approaches are listed next.
Approach #4
function out = soln_sprintf_bsxfun_gpu(list_of_words)
N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells. Now this one uses sprintf as
%// firstly proposed in @chappjc's solution and he has agreed to let me use
%// it, so appreciating his help on this.
digits_array = gpuArray(single(sprintf('%s_',list_of_words{:})));
digits_array = digits_array - 48;
mask_us = digits_array==47; %// mask of underscores
pos_us = find(mask_us);
wordend_idx = pos_us - gpuArray.colon(1,numel(pos_us)); %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length
%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1 = single(zeros(max_wordlens,numel(pos_us),'gpuArray')); %//'
num1(bsxfun(@ge,gpuArray.colon(1,max_wordlens)',...
max_wordlens+1-single(wordlens))) = digits_array(~mask_us); %//'
%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
outg = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#
out = gather(outg);
return;
Approach #5
function out = soln_bytstrm_bsxfun_gpu(list_of_words)
us_ascii_num = 95;
allchars = [list_of_words{:}];
%// At my 32-bit system getByteStreamFromArray gets the significant digits
%// from index 65 onwards. Thus, we need to crop/index it accordingly
alldigits = getByteStreamFromArray(allchars);
digits_array1 = gpuArray(alldigits(65:65+numel(allchars)-1));
% --------------------------------------------------------------------
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
N = sum(digits_array1(1:lens(1))==us_ascii_num)+1; %// Number of "words" per cell
lens = gpuArray(lens);
cumlens = cumsum(lens)'; %//'# Cumulative lengths
% ----------------------------------------------------------
mask_us = digits_array1==us_ascii_num; %// mask of underscores
pos_us = find(mask_us);
starts = [[1 cumlens(1:end-1)+1];reshape(pos_us+1,N-1,[])];
ends = [reshape(pos_us-1,N-1,[]) ; cumlens];
gaps = ends(:) - starts(:) + 1;
maxg = max(gaps);
mask1 = bsxfun(@lt,maxg-gaps',[1:maxg]');
num_array = zeros(maxg,numel(lens)*N,'gpuArray'); %//'
num_array(mask1) = digits_array1(~mask_us)-48;
out = reshape(10.^(maxg-4:-1:-3)*num_array,N,[]).'; %//'
out = gather(out);
return;
Upvotes: 5
Reputation: 30579
I'd apply sprintf
with a comma-separated list input to generate a single, fully-delimited 1D string (with delimiter at end of each line/string), and then use textscan
to pull out the integers:
extractIntsSprintfTextScan.m
function M = extractIntsSprintfTextScan(list_of_words)
% optimized lines, see below for explanation of each op
M = textscan(sprintf('%s_',C{:}),'%.0f', 'Delimiter','_');
M = reshape(M{1}, [], numel(C)).';
% cell to string, adding _ suffix
% s = sprintf('%s_',list_of_words{:});
% extract integers given _ as delimiter
% C = textscan(s,'%u','Delimiter','_','CollectOutput',1); % can also use %d for signed
% C = textscan(s,'%.0f_','CollectOutput',1); % a little faster to put _ in the pattern
% matrix form
% M = reshape(C{1},[],numel(list_of_words)).'/1000; % remove transpose for faster output
end
Benchmarks (previous timings now stale, see sprintf
+textscan
solution in Amro's benchmark post instead)
Machine
MATLAB R2014b 64-bit
Windows 7 64-bit
24 GB RAM
Dual Xeon X5550 (2.67GHZ, 8 physical cores)
UPDATE: Change output type to double from an integer type. Re-run benchmarks.
UPDATE 2: Change format specifier from '%f' to '%.0f', maybe allowing it to speed up the scan since no decimals are requested. Increase N = 100e3;
(see the GitHub Gist).
UPDATE 3: New benchmarks with version 2 of CST-Link's timing functions (see new GitHub Gist with suggested usage of referee_timing.m).
UPDATE 4: Add Luis' solution. Note that the results fluctuate since the data is randomly generated.
UPDATE 5: Optimization tweaks -- see Amro's benchmarking post.
Upvotes: 3
Reputation:
(Most recent Proposal)
This kind of eliminates the last MATLAB-specific part of my anti-MATLAB solution; pure byte-scanning, loop-in-loop, mem-allocating stuff:
%'all_loops.m'
function array_of_numbers = all_loops(list_of_words)
%'Precalculate important numbers'
n_numbers = 1 + sum(list_of_words{1}=='_');
n_words = numel(list_of_words);
%'Allocate memory'
array_of_numbers = zeros(n_numbers, n_words);
slice_of_numbers = zeros(n_numbers, 1);
%'Loop trough chunks of cell array'
for k = 1:n_words
str = list_of_words{k};
pos_max = size(str, 2);
value = 0;
index = 1;
for pos = 1:pos_max
if str(pos) == '_'
slice_of_numbers(index) = value;
value = 0;
index = index + 1;
else
value = 10*value + (str(pos) - '0');
end;
slice_of_numbers(index) = value; %'almost forgot'
end;
array_of_numbers(:, k) = slice_of_numbers;
end;
%'This is a waste of memory, but is kind of required'
array_of_numbers = transpose(array_of_numbers / 1000);
end
(Benchmarking) For the following configuration (obtained from configinfo.m)
MATLAB configuration information CPU: x86 Family 6 Model 58 Stepping 9, GenuineIntel
This data was gathered on: 24-Oct-2014 14:19:31 The measured CPU speed is: 2600 MHz
MATLAB version: 7.14.0.739 (R2012a) Number of processors: 4
MATLAB root: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx RAM: 3289 MB
MATLAB accelerator enabled Swap space: 6576 MB
MATLAB JIT: enabled Microsoft Windows 7
MATLAB assertions: disabled Number of cores: 2
MATLAB Desktop: enabled Number of threads: 2
the following results were obtained (3rd run):
Generating 10000-words test case...
Timing 1000000-words test case...
approach4: Error - Out of memory. Type HELP MEMORY for your options.
approach1: Error - Out of memory. Type HELP MEMORY for your options.
single_sscanf_solution: Error - Out of memory. Type HELP MEMORY for your options.
char_and_sscanf_solution: 5.076296[s]
extractIntsSprintfTextScan: 4.328066[s]
all_loops: 1.795730[s]
eval_and_loops_solution: 10.027541[s]
Generating 100000-words test case...
Timing 100000-words test case...
approach4: 0.252107[s]
approach1: 0.370727[s]
single_sscanf_solution: 1.364936[s]
char_and_sscanf_solution: 0.515599[s]
extractIntsSprintfTextScan: 0.444586[s]
all_loops: 0.179575[s]
eval_and_loops_solution: 1.010240[s]
Generating 10000-words test case...
Timing 10000-words test case...
approach4: 0.026642[s]
approach1: 0.039550[s]
single_sscanf_solution: 0.136711[s]
char_and_sscanf_solution: 0.049708[s]
extractIntsSprintfTextScan: 0.042608[s]
all_loops: 0.017636[s]
eval_and_loops_solution: 0.099111[s]
You may notice a difference in the last step, between "Generating 10000..." and "Timing 1000000..."; in order to decimate the occupied memory (otherwise everything would fail on generating the test case) i used repmat(referee_test_case(1e4), 100, 1)
instead of referee_test_case(1e6)
.
Upvotes: 2
Reputation: 124543
Challenge accepted! Here is my fastest implementation so far, and yes it's a C++ MEX-file :)
#include "mex.h"
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
namespace {
// get i-th string from cell-array of strings
std::string getString(const mxArray *cellstr, mwIndex idx)
{
mxArray *arr = mxGetCell(cellstr, idx);
if (arr == NULL) mexErrMsgIdAndTxt("mex:err", "null/uninitialized");
if (!mxIsChar(arr)) mexErrMsgIdAndTxt("mex:err", "not a string");
char *cstr = mxArrayToString(arr);
if (cstr == NULL) mexErrMsgIdAndTxt("mex:err", "null");
std::string str(cstr);
mxFree(cstr);
return str;
}
// count of numbers in char-delimited string
mwSize count_numbers(const std::string &s)
{
return std::count(s.begin(), s.end(), '_') + 1;
}
// parse numbers
template <typename T>
void parseNumbers(const mxArray *cellstr, const mwSize len, std::vector<T> &v)
{
// cell-array of strings
std::vector<std::string> vs;
vs.reserve(len);
for (mwIndex idx=0; idx<len; ++idx) {
vs.push_back(getString(cellstr, idx));
}
// join vector of strings into one
std::stringstream ss;
std::copy(vs.begin(), vs.end(),
std::ostream_iterator<std::string>(ss, "_"));
// split string into numbers (separated by single-char delimiter)
T num;
while (ss >> num) {
v.push_back(num);
ss.ignore(1);
}
}
};
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
// validate inputs
if (nrhs!=1 || nlhs>1) mexErrMsgIdAndTxt("mex:err", "wrong num args");
if (!mxIsCell(prhs[0])) mexErrMsgIdAndTxt("mex:err", "not cell");
// allocate output matrix
mwSize len = mxGetNumberOfElements(prhs[0]);
mwSize sz = (len > 0) ? count_numbers(getString(prhs[0],0)) : 0;
plhs[0] = mxCreateNumericMatrix(sz, len, mxDOUBLE_CLASS, mxREAL);
if (plhs[0] == NULL) mexErrMsgIdAndTxt("mex:err", "null");
if (len == 0 || sz == 0) return;
// parse cell array into numbers
std::vector<int> v;
v.reserve(len*sz);
parseNumbers(prhs[0], len, v);
if (v.size() != (len*sz)) mexErrMsgIdAndTxt("mex:err", "wrong size");
// copy numbers into output matrix
std::copy(v.begin(), v.end(), mxGetPr(plhs[0]));
}
The code is straightforward to read; I'm using the standard STL library where possible (no dirty tricks), plus there's plenty of checks and input validation, so it should robust as long you follow the input format described in the question.
I'll update with some benchmarks later, but for now you can test it on your own and compare against the other solutions...
Note the above MEX-function returns a matrix of size n_numbers * n_words
, so you'll want to transpose the result.
Here is a wrapper M-function you can use to run under the referee program:
function array_of_numbers = mex_solution(list_of_words)
array_of_numbers = solution_mex(list_of_words).';
end
Let's simplify the code a bit; this version uses much less memory by processing one string at a time, and placing the result directly into the output matrix:
#include "mex.h"
#include <string>
#include <sstream>
#include <algorithm>
namespace {
std::string getString(const mxArray *cellstr, const mwIndex idx)
{
// get i-th string from cell-array of strings
mxArray *arr = mxGetCell(cellstr, idx);
if (!arr || !mxIsChar(arr)) mexErrMsgIdAndTxt("mex:err", "not a string");
char *cstr = mxArrayToString(arr);
if (cstr == NULL) mexErrMsgIdAndTxt("mex:err", "null");
std::string str(cstr);
mxFree(cstr);
return str;
}
mwSize count_numbers(const std::string &s)
{
// count of numbers in char-delimited string
return std::count(s.begin(), s.end(), '_') + 1;
}
};
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
// validate inputs
if (nrhs!=1 || nlhs>1) mexErrMsgIdAndTxt("mex:err", "wrong num args");
if (!mxIsCell(prhs[0])) mexErrMsgIdAndTxt("mex:err", "not a cell");
// determine sizes
const mxArray *cellstr = prhs[0];
const mwSize n_words = mxGetNumberOfElements(cellstr);
const mwSize n_nums = (n_words > 0) ?
count_numbers(getString(cellstr,0)) : 0;
// allocate output matrix
plhs[0] = mxCreateDoubleMatrix(n_nums, n_words, mxREAL);
if (plhs[0] == NULL) mexErrMsgIdAndTxt("mex:err", "null");
if (n_words == 0 || n_nums == 0) return;
double *out = mxGetPr(plhs[0]);
// extract numbers from strings
for (mwIndex idx=0, i=0; idx<n_words; ++idx) {
std::istringstream ss(getString(cellstr, idx));
int num;
while(ss >> num) {
out[i++] = num;
ss.ignore(1);
}
}
}
Next step, let's make it multi-threaded; We can do this implicitly using OpenMP by just adding two lines of code! I am parallelizing over each string.
First we add the omp parallel for
pragma, then we make the index variable i
private to each thread, that way threads know the column starting index.
In the previous code from edit#1, just replace the last loop with:
// extract numbers from strings
#pragma omp parallel for
for (mwIndex idx=0; idx<n_words; ++idx) {
mwIndex i = idx*n_nums; // starting index for i-th column
std::istringstream ss(getString(cellstr, idx));
int num;
while(ss >> num) {
out[i++] = num;
ss.ignore(1);
}
}
I'm running R2014a on Windows x64, and I tried compiling with both VS2013 and MinGW-w64 GCC 4.9.1. Let me point out that the GCC-compiled version is way faster than all solutions so far:
% compile with MinGW-w64
>> mex -largeArrayDims -f mingwopts.bat solution_mex.cpp -output solution_mex_gcc
% set appropriate number of threads
>> setenv('OMP_NUM_THREADS','4');
% quick benchmark
>> timeit(@() solution_mex_gcc(repmat({'02_04_04_52_23_14_54_672_0'},1e6,1)).')
ans =
0.6658
Upvotes: 2
Reputation: 112659
The following solution seems to work:
function array_of_numbers = char_and_sscanf_solution(list_of_words)
s = char(list_of_words);
s(s==95) = 32; %// 95 is '_'; 32 is ' '
s = [ s repmat(32, numel(list_of_words), 1) ];
array_of_numbers = reshape(sscanf(s.','%i '), [], numel(list_of_words)).'/1000;
Benchmarking results using referee_timing.m
and referee_test_case.m
:
Generating 10000-words test case...
Timing 10000-words test case...
eval_and_loops_solution: 0.190642[s]
single_sscanf_solution: 0.234413[s]
approach1: 0.073901[s]
char_and_sscanf_solution: 0.068311[s]
Generating 100000-words test case...
Timing 100000-words test case...
eval_and_loops_solution: 1.957728[s]
single_sscanf_solution: 2.426764[s]
approach1: 0.766020[s]
char_and_sscanf_solution: 0.706387[s]
Generating 1000000-words test case...
Timing 1000000-words test case...
eval_and_loops_solution: 18.975746[s]
char_and_sscanf_solution: 7.147229[s]
Computer information:
Matlab R2010b
Windows 7 Home Premium 64 bits Service Pack 1
Pentium(R)Dual Core CPU T4300 2.10 GHz
4 GB RAM
Upvotes: 3