user2271770
user2271770

Reputation:

MATLAB performance benchmarking

The set-up:

This post is about testing the performances of solutions to the following problem:

A cell array of S strings formatted as numbers separated by underscores is given, e.g.:

        list_of_words = repmat({'02_04_04_52_23_14_54_672_0'},10,1);

Is guaranteed for all strings that:

  1. they contain only decimal digits and underscores;
  2. the number of underscores is the same;
  3. there are no two consecutive underscores.

Write MATLAB code that would convert the cell array of strings to a numeric matrix S×M of doubles (values 1000 time smaller), where S is the number of strings and M is the number of numbers in a string.

A very similar problem was posted on StackOverflow, with several solutions proposed. The original goal was improving speed performance.

The code:

Two of the solutions, designed for speed, seem to have wildly varying performances from one MATLAB installation to another, or even from one run to another. Also, they have really different implementation styles.

In the dark corner you'll find a solution that goes against MATLAB's most sacred tenets: eval is evil, and loops should be avoided at all costs. However, the code tries to optimise the speed by avoiding repeated memory allocations, by using the fast ways of converting strings to numbers, and by doing in-place operations:

%'eval_and_loops_solution.m'
function array_of_numbers = eval_and_loops_solution(list_of_words)

        n_numbers = 1 + sum(list_of_words{1}=='_');
        n_words   = numel(list_of_words);

        array_of_numbers = zeros(n_numbers, n_words);

        for k = 1:n_words
                temp = ['[', list_of_words{k}, ']'];
                temp(temp=='_') = ';';
                array_of_numbers(:,k) = eval(temp);
        end;

         %'this is a waste of memory, but is kind of required'
        array_of_numbers = transpose(array_of_numbers / 1000);

end

Note: The original solution returned the result as M×S double array. The code was adapted for S×M; however, this adaptation consumes twice the memory. And yes, I wrote this solution.

In the clear corner you'll find a solution true to MATLAB spirit, that avoids using loops and eval, favoring a single sscanf read of all the strings (which is a clever way of avoiding the overhead of calling the function S times):

%'single_sscanf_solution.m'
function array_of_numbers = single_sscanf_solution(list_of_words)

        N = 1 + sum(list_of_words{1}=='_'); %'hard-coded in the original'
        lens = cellfun(@numel,list_of_words);
        tlens = sum(lens);
        idx(1,tlens)=0;
        idx(cumsum(lens(1:end-1))+1)=1;
        idx2 = (1:tlens) + cumsum(idx);

        one_str(1:max(idx2)+1)='_';
        one_str(idx2) = [list_of_words{:}];
        delim = repmat('%d_',1,N*numel(lens));
        array_of_numbers = reshape(sscanf(one_str, delim),N,[])'./1000;

end

Note: This solution belongs to @Divakar.

The referee is made up of two functions: one that generates test cases, and one that does the timig.

The test case generator groups in a cell array underscore-delimited strings out of 10 random integers between 1 and 9999 (for the sake of simplicity); however, an implementation should only assume that the numbers are positive or zero, and the number should fit in a double:

%'referee_test_case.m'
function list_of_words = referee_test_case(n_words)

        NUM_PER_WORD  = 10;
        NUM_MAX       = 9999;

        word_format   = [repmat('%d_', 1, NUM_PER_WORD-1), '%d'];
        list_of_words = cell(n_words,1);

        fprintf('Generating %d-words test case...\n', n_words);
        for k = 1:n_words
                list_of_words{k} = sprintf(word_format, randi(NUM_MAX, [1, NUM_PER_WORD]));
        end;

end

The timing function takes as arguments a test case and an arbitrary number of processing function handles; it is implemented such as errors in one function should not bother the execution of the others. It uses timeit as recommended by @Divakar and @LuisMendo; for those who don't have this function in their default MATLAB installation, it may be downloaded from MATLAB Central / File Exchange:

%'referee_timing.m'
function referee_timing(test_case, varargin)
        %' Specify the test case as a cell array of strings, then the function '
        %' handles that will process the test case.                            '
        %'                                                                     '
        %' The function uses timeit() to evaluate the performance of different '
        %' processing handles; if your MATLAB installation does not have it    '
        %' natively, download the available version from File Exchange:        '
        %'                                                                     '
        %'     http://www.mathworks.com/matlabcentral/fileexchange/18798-timeit-benchmarking-function '

        fprintf('Timing %d-words test case...\n', numel(test_case));
        for k = 1:numel(varargin)
                try
                        t = timeit(@() varargin{k}(test_case));
                        fprintf('  %s: %f[s]\n', func2str(varargin{k}), t);
                catch ME
                        fprintf('  %s: Error - %s\n', func2str(varargin{k}), ME.message);
                end;
        end;
end

The problem:

As said before, the results seem to vary from one MATLAB installation to another, and even from one run to another. The problem that I propose is three-fold:

  1. On your specific MATLAB installation (version + OS + HW), how well the two solutions perform compared to eachother?
  2. Is it possible to improve one solution or the other in a sizable way?
  3. Are there even faster MATLAB-native (e.g. no MEX or special toolboxes) solutions based on completely different ideas/functions?

For point 1. (benchmarking), please create in your MATLAB path the four function files eval_and_loops_solution.m, single_sscanf_solution.m, referee_test_case.m, referee_timig.m and other functions that you might want to test (for example, the implementations proposed by answers). Use them for several numbers of words, e.g. like this script:

%'test.m'
clc;
feature accel on;  %'tune for speed'

%'low memory stress'
referee_timing( ...
   referee_test_case(10^4), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

%'medium memory stress'
referee_timing( ...
   referee_test_case(10^5), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

%'high memory stress'
referee_timing( ...
   referee_test_case(10^6), ...
   @eval_and_loops_solution, ...
   @single_sscanf_solution ... %'add here other functions'
);

When posting the results, please specify your MATLAB version, the OS, the size of RAM and the CPU model. Please be aware that these tests might take some time for large number of words! However, running this script will not alter the content of your current workspace.

For points 2. or 3. you can post code that use the same in/out conventions as eval_and_loops_solution.m and single_sscanf_solution.m, along with the benchmarking that supports the claim.

The bounties:

I will up-vote every benchmarking answer, and I encourage everybody to do that. Is the least that one can do for people that contribute with their skills, time and processing power.

A +500 bounty bonus will go to the author of the fastest code, be it one of the two proposed, or a third new one that outperforms them. I really hope that this will match the general agreement. The bonus will be given in a month week from the original date of posting.

Upvotes: 17

Views: 1551

Answers (9)

chappjc
chappjc

Reputation: 30579

This builds on Amro's MEX solution. Since the problem as defined by CST-Link is so tightly constrained, a fast implementation can be made by sacrificing robustness and forgoing error handling. Hence, format the input as specified!

Substitute the main loop in Amro's source with the following for ~4x speed up, without multi-threading (full source):

// extract numbers from strings
for (mwIndex idx=0, i=0; idx<n_words; ++idx) {
    std::string str = getString(cellstr, idx);

    std::string::const_iterator istr = str.cbegin();
    for (; istr != str.cend(); ++istr) {
        if (*istr < '0' || *istr > '9')
        {
            ++i;
            continue;
        }
            out[i] *= 10;
            out[i] += *istr - '0';
    }
    ++i;
}

More manual optimization can be done, but I'm going to bed. Also, I'm planning to the multi-threaded version at some point, but it's pretty straightforward to add the pragma.


Benchmarks (UPDATED T-minus 12 hrs until end of grace period)

Benchmarks with Amro's latest package of functions plus Ben Voigt's solution show different timings on different machines, but for what it's worth it looks like there is a virtual tie between a few solutions. On my i7 laptop with R2014a 64-bit on Windows 8 (on CUDA), my textscan, Divakar's latest and Ben Voigt's solutions and are best, with indistinguishable performance differences. CST-Link's "all loops" is very close but consistently a hair slower than the others. These benchmarks use up to 1e6 strings. The size of the cell array decides the fastest method.

Taking out the MEX solutions (and GPU because my of my laptop), the solutions are placed as follows for various numbers of rows. Again, the results suggest that the appropriate method depends on the number of rows:

i7 16GB R2014a

              Solution                  1e6     5e5     1e5     1e4     1e3     100
____________________________________    ____    ____    ____    ____    ____    ____

'solution_textscan_sprintf_chappjc'      1       2       2       5       5       4  
'solution_bsxfun_sprintf_Divakar'        2       1       3       2       1       8  
'func_voigt_loop'                        3       3       5       1       2       1  
'func_textscan_sprintf'                  4       4       4       4       6       3  
'solution_bsxfun_bytestream_Divakar'     5       5       6       3       3      10  
'solution_loops_CSTLink'                 6       7       7       7       8       6  
'func_sscanf_sprintf'                    7       6       1       6       7       7  
'solution_bsxfun_cumsum_Divakar'         8       8       8       8       4       9  
'solution_sscanf_char_LuisMendo'         9       9       9       9       9      11  
'func_textscan_strjoin'                 10      10      15      10      10      16  
'func_sscanf_strjoin'                   11      11      10      11      11      19  
'func_eval_cellfun'                     12      12      13      12      12      18  
'func_eval_loop'                        13      13      11      13      13      12  
'solution_eval_loops_CSTLink'           14      15      14      14      14      13  
'func_sscanf_loop'                      15      14      12      15      15      15  
'func_textscan_loop'                    16      18      19      18      18      21  
'func_sscanf_cellfun'                   17      17      16      17      17      22  
'func_str2num_cellfun'                  18      19      18      19      19      23  
'func_str2num_loop'                     19      20      20      20      20      24  
'solution_sscanf_Divakar'               20      16      17      16      16      20  
'func_textscan_cellfun'                 21      21      21      21      21      25  
'func_str2num_sprintf'                  22      23      23      22      22       5  
'func_str2num_strjoin'                  23      24      25      23      23      17  
'func_eval_sprintf'                     24      22      24      24      24       2  
'func_eval_strjoin'                     25      25      22      25      25      14 

enter image description here

enter image description here

>> t
t = 
                    func                    nrows    ncols       time   
    ____________________________________    _____    _____    __________

    'func_eval_cellfun'                       100    10       0.00074097
    'func_eval_loop'                          100    10        0.0006137
    'func_eval_sprintf'                       100    10       0.00017814
    'func_eval_strjoin'                       100    10       0.00068062
    'func_sscanf_cellfun'                     100    10        0.0012905
    'func_sscanf_loop'                        100    10       0.00069992
    'func_sscanf_sprintf'                     100    10       0.00022678
    'func_sscanf_strjoin'                     100    10       0.00075428
    'func_str2num_cellfun'                    100    10        0.0014366
    'func_str2num_loop'                       100    10        0.0014904
    'func_str2num_sprintf'                    100    10       0.00020667
    'func_str2num_strjoin'                    100    10       0.00073786
    'func_textscan_cellfun'                   100    10        0.0018517
    'func_textscan_loop'                      100    10        0.0012629
    'func_textscan_sprintf'                   100    10       0.00020092
    'func_textscan_strjoin'                   100    10       0.00071305
    'func_voigt_loop'                         100    10       0.00017711
    'solution_bsxfun_bytestream_Divakar'      100    10       0.00029257
    'solution_bsxfun_cumsum_Divakar'          100    10       0.00028395
    'solution_bsxfun_sprintf_Divakar'         100    10       0.00023879
    'solution_eval_loops_CSTLink'             100    10       0.00066461
    'solution_loops_CSTLink'                  100    10       0.00021923
    'solution_mex_Amro'                       100    10       0.00020502
    'solution_mex_chappjc'                    100    10       6.3164e-05
    'solution_mex_omp_Amro'                   100    10       0.00018224
    'solution_mex_omp_chappjc'                100    10       8.2565e-05
    'solution_sscanf_Divakar'                 100    10       0.00084008
    'solution_sscanf_char_LuisMendo'          100    10       0.00033844
    'solution_textscan_sprintf_chappjc'       100    10       0.00020396
    'func_eval_cellfun'                      1000    10        0.0058036
    'func_eval_loop'                         1000    10        0.0060269
    'func_eval_sprintf'                      1000    10         0.055797
    'func_eval_strjoin'                      1000    10         0.057631
    'func_sscanf_cellfun'                    1000    10         0.011712
    'func_sscanf_loop'                       1000    10        0.0067405
    'func_sscanf_sprintf'                    1000    10        0.0019112
    'func_sscanf_strjoin'                    1000    10        0.0040608
    'func_str2num_cellfun'                   1000    10         0.013712
    'func_str2num_loop'                      1000    10         0.014961
    'func_str2num_sprintf'                   1000    10          0.04916
    'func_str2num_strjoin'                   1000    10         0.051823
    'func_textscan_cellfun'                  1000    10         0.017256
    'func_textscan_loop'                     1000    10         0.012454
    'func_textscan_sprintf'                  1000    10        0.0016489
    'func_textscan_strjoin'                  1000    10        0.0038387
    'func_voigt_loop'                        1000    10        0.0012892
    'solution_bsxfun_bytestream_Divakar'     1000    10        0.0013951
    'solution_bsxfun_cumsum_Divakar'         1000    10        0.0015138
    'solution_bsxfun_sprintf_Divakar'        1000    10        0.0011496
    'solution_eval_loops_CSTLink'            1000    10        0.0061538
    'solution_loops_CSTLink'                 1000    10        0.0020528
    'solution_mex_Amro'                      1000    10        0.0019629
    'solution_mex_chappjc'                   1000    10       0.00051825
    'solution_mex_omp_Amro'                  1000    10       0.00085117
    'solution_mex_omp_chappjc'               1000    10       0.00025825
    'solution_sscanf_Divakar'                1000    10        0.0078551
    'solution_sscanf_char_LuisMendo'         1000    10        0.0031104
    'solution_textscan_sprintf_chappjc'      1000    10        0.0016144
    'func_eval_cellfun'                     10000    10          0.05772
    'func_eval_loop'                        10000    10         0.061705
    'func_eval_sprintf'                     10000    10          0.54464
    'func_eval_strjoin'                     10000    10          0.57007
    'func_sscanf_cellfun'                   10000    10           0.1192
    'func_sscanf_loop'                      10000    10         0.068017
    'func_sscanf_sprintf'                   10000    10         0.019622
    'func_sscanf_strjoin'                   10000    10         0.038232
    'func_str2num_cellfun'                  10000    10          0.13811
    'func_str2num_loop'                     10000    10          0.14812
    'func_str2num_sprintf'                  10000    10          0.48726
    'func_str2num_strjoin'                  10000    10          0.50528
    'func_textscan_cellfun'                 10000    10          0.17378
    'func_textscan_loop'                    10000    10           0.1241
    'func_textscan_sprintf'                 10000    10         0.016595
    'func_textscan_strjoin'                 10000    10         0.035599
    'func_voigt_loop'                       10000    10         0.012136
    'solution_bsxfun_bytestream_Divakar'    10000    10         0.015908
    'solution_bsxfun_cumsum_Divakar'        10000    10          0.02301
    'solution_bsxfun_sprintf_Divakar'       10000    10         0.014862
    'solution_eval_loops_CSTLink'           10000    10         0.063188
    'solution_loops_CSTLink'                10000    10         0.020153
    'solution_mex_Amro'                     10000    10         0.019252
    'solution_mex_chappjc'                  10000    10        0.0051221
    'solution_mex_omp_Amro'                 10000    10        0.0066551
    'solution_mex_omp_chappjc'              10000    10        0.0014584
    'solution_sscanf_Divakar'               10000    10         0.096345
    'solution_sscanf_char_LuisMendo'        10000    10         0.031047
    'solution_textscan_sprintf_chappjc'     10000    10         0.016736
    'func_eval_cellfun'                     1e+05    10          0.78876
    'func_eval_loop'                        1e+05    10           0.6119
    'func_eval_sprintf'                     1e+05    10           6.7603
    'func_eval_strjoin'                     1e+05    10           5.7204
    'func_sscanf_cellfun'                   1e+05    10           1.2096
    'func_sscanf_loop'                      1e+05    10          0.68303
    'func_sscanf_sprintf'                   1e+05    10          0.21101
    'func_sscanf_strjoin'                   1e+05    10          0.55226
    'func_str2num_cellfun'                  1e+05    10            1.411
    'func_str2num_loop'                     1e+05    10           1.8229
    'func_str2num_sprintf'                  1e+05    10           6.1474
    'func_str2num_strjoin'                  1e+05    10            7.551
    'func_textscan_cellfun'                 1e+05    10           2.5898
    'func_textscan_loop'                    1e+05    10           1.7934
    'func_textscan_sprintf'                 1e+05    10          0.25421
    'func_textscan_strjoin'                 1e+05    10           1.1762
    'func_voigt_loop'                       1e+05    10          0.25602
    'solution_bsxfun_bytestream_Divakar'    1e+05    10            0.265
    'solution_bsxfun_cumsum_Divakar'        1e+05    10          0.35656
    'solution_bsxfun_sprintf_Divakar'       1e+05    10          0.23481
    'solution_eval_loops_CSTLink'           1e+05    10          0.86425
    'solution_loops_CSTLink'                1e+05    10          0.28436
    'solution_mex_Amro'                     1e+05    10          0.27104
    'solution_mex_chappjc'                  1e+05    10         0.078901
    'solution_mex_omp_Amro'                 1e+05    10         0.096553
    'solution_mex_omp_chappjc'              1e+05    10          0.03679
    'solution_sscanf_Divakar'               1e+05    10           1.3818
    'solution_sscanf_char_LuisMendo'        1e+05    10          0.43994
    'solution_textscan_sprintf_chappjc'     1e+05    10          0.21271
    'func_eval_cellfun'                     5e+05    10           3.7658
    'func_eval_loop'                        5e+05    10           3.8106
    'func_eval_sprintf'                     5e+05    10           32.383
    'func_eval_strjoin'                     5e+05    10           40.451
    'func_sscanf_cellfun'                   5e+05    10           8.5704
    'func_sscanf_loop'                      5e+05    10            4.707
    'func_sscanf_sprintf'                   5e+05    10           1.4362
    'func_sscanf_strjoin'                   5e+05    10           2.8301
    'func_str2num_cellfun'                  5e+05    10           9.6439
    'func_str2num_loop'                     5e+05    10           10.453
    'func_str2num_sprintf'                  5e+05    10           35.818
    'func_str2num_strjoin'                  5e+05    10           37.277
    'func_textscan_cellfun'                 5e+05    10           12.418
    'func_textscan_loop'                    5e+05    10           8.8237
    'func_textscan_sprintf'                 5e+05    10           1.2793
    'func_textscan_strjoin'                 5e+05    10           2.6496
    'func_voigt_loop'                       5e+05    10           1.2486
    'solution_bsxfun_bytestream_Divakar'    5e+05    10            1.324
    'solution_bsxfun_cumsum_Divakar'        5e+05    10           1.8229
    'solution_bsxfun_sprintf_Divakar'       5e+05    10           1.2113
    'solution_eval_loops_CSTLink'           5e+05    10           6.5759
    'solution_loops_CSTLink'                5e+05    10           1.4583
    'solution_mex_Amro'                     5e+05    10           1.3718
    'solution_mex_chappjc'                  5e+05    10          0.39711
    'solution_mex_omp_Amro'                 5e+05    10          0.48046
    'solution_mex_omp_chappjc'              5e+05    10          0.48174
    'solution_sscanf_Divakar'               5e+05    10           7.7943
    'solution_sscanf_char_LuisMendo'        5e+05    10           2.2332
    'solution_textscan_sprintf_chappjc'     5e+05    10           1.2399
    'func_eval_cellfun'                     1e+06    10           7.3884
    'func_eval_loop'                        1e+06    10           7.5519
    'func_eval_sprintf'                     1e+06    10           69.868
    'func_eval_strjoin'                     1e+06    10           71.964
    'func_sscanf_cellfun'                   1e+06    10           15.061
    'func_sscanf_loop'                      1e+06    10           8.4163
    'func_sscanf_sprintf'                   1e+06    10           2.7099
    'func_sscanf_strjoin'                   1e+06    10           5.1453
    'func_str2num_cellfun'                  1e+06    10            17.42
    'func_str2num_loop'                     1e+06    10           18.165
    'func_str2num_sprintf'                  1e+06    10           60.902
    'func_str2num_strjoin'                  1e+06    10           63.579
    'func_textscan_cellfun'                 1e+06    10           20.423
    'func_textscan_loop'                    1e+06    10           14.309
    'func_textscan_sprintf'                 1e+06    10           2.2853
    'func_textscan_strjoin'                 1e+06    10           4.5216
    'func_voigt_loop'                       1e+06    10           2.2443
    'solution_bsxfun_bytestream_Divakar'    1e+06    10           2.3495
    'solution_bsxfun_cumsum_Divakar'        1e+06    10           3.3843
    'solution_bsxfun_sprintf_Divakar'       1e+06    10           2.0311
    'solution_eval_loops_CSTLink'           1e+06    10           7.7524
    'solution_loops_CSTLink'                1e+06    10           2.4947
    'solution_mex_Amro'                     1e+06    10            2.486
    'solution_mex_chappjc'                  1e+06    10          0.76551
    'solution_mex_omp_Amro'                 1e+06    10          0.92226
    'solution_mex_omp_chappjc'              1e+06    10          0.88736
    'solution_sscanf_Divakar'               1e+06    10           19.673
    'solution_sscanf_char_LuisMendo'        1e+06    10           3.8578
    'solution_textscan_sprintf_chappjc'     1e+06    10           2.0074

Next...

X5550 24GB R2014b

The order is different, but the differences are again insignificant.

enter image description here

enter image description here

Timings exceed 30000 character limit, but I can post them somewhere if desired. The order is clear though.

I suggest CST-Link disregard all these measurements in deciding.


Of course, the MEX solutions rule. The solution above with std::string::const_iterator istr is fast, even faster with OpenMP. GCC 4.9.1 (pthreads) and VS2013 have comparable performance for this code. Threaded source here.

Upvotes: 6

Ben Voigt
Ben Voigt

Reputation: 283614

Here, in the MATLAB holiday spirit, is a highly-vectorized solution:

function M = func_voigt_loop(C)
    S = sprintf('_%s', C{:});
    digitpos = [find(S(2:end) == '_'), numel(S)];
    M = zeros(size(digitpos));
    place = 1;
    while 1
        digits = S(digitpos);
        mask = double(digits ~= '_');
        M = M + mask.*(digits - '0') * place;
        place = place * 10;
        digitpos = digitpos - mask;
        if ~any(mask); break; end
    end
    M = reshape(M, [], numel(C))';
end

It's only a few lines, easy to understand (grab the ones digit of every number, add in the tens digit, etc), and doesn't use any "exotic" functions like eval or textscan. It's also pretty fast.

This is an idea similar to something I previously developed for doing datenum on an entire column of data in a CSV file, the MATLAB version, even with an particular pattern specified, was just horribly slow. A later release of MATLAB improved datenum significantly, but my hand-tuned version still beats it handily.

Upvotes: 2

Amro
Amro

Reputation: 124543

Here is a bunch of implementations for you to benchmark (all pure MATLAB). Some of them borrow ideas from the other solutions.

function M = func_eval_strjoin(C)
    M = eval(['[' strjoin(strrep(C,'_',' ').',';') ']']);
end

function M = func_eval_sprintf(C)
    C = strrep(C,'_',' ');
    M = eval(['[' sprintf('%s;',C{:}) ']']);
end

function M = func_eval_cellfun(C)
    M = cell2mat(cellfun(@eval, strcat('[',strrep(C,'_',' '),']'), 'Uniform',false));
end

function M = func_eval_loop(C)
    M = zeros(numel(C),nnz(C{1}=='_')+1);
    for i=1:numel(C)
        M(i,:) = eval(['[' strrep(C{i},'_',' ') ']']);
    end
end
function M = func_str2num_strjoin(C)
    M = reshape(str2num(strjoin(strrep(C.','_',' '))), [], numel(C)).';
end

function M = func_str2num_sprintf(C)
    C = strrep(C,'_',' ');
    M = reshape(str2num(sprintf('%s ',C{:})), [], numel(C)).';
end

function M = func_str2num_cellfun(C)
    M = cell2mat(cellfun(@str2num, strrep(C, '_', ' '), 'Uniform',false));
end

function M = func_str2num_loop(C)
    M = zeros(numel(C),nnz(C{1}=='_')+1);
    for i=1:numel(C)
        M(i,:) = str2num(strrep(C{i}, '_', ' '));
    end
end
function M = func_sscanf_strjoin(C)
    M = reshape(sscanf(strjoin(C', '_'), '%f_'), [], numel(C)).';
end

function M = func_sscanf_sprintf(C)
    M = reshape(sscanf(sprintf('%s_',C{:}),'%f_'), [], numel(C)).';
end

function M = func_sscanf_cellfun(C)
    M = cell2mat(cellfun(@(c) sscanf(c, '%f_'), C, 'Uniform',false).').';
end

function M = func_sscanf_loop(C)
    M = zeros(numel(C),nnz(C{1}=='_')+1);
    for i=1:numel(C)
        M(i,:) = sscanf(C{i}, '%f_');
    end
end
function M = func_textscan_strjoin(C)
    M = textscan(strjoin(C', '_'), '%.0f', 'Delimiter','_');
    M = reshape(M{1}, [], numel(C)).';
end

function M = func_textscan_sprintf(C)
    M = textscan(sprintf('%s_',C{:}),'%.0f', 'Delimiter','_');
    M = reshape(M{1}, [], numel(C)).';
end

function M = func_textscan_cellfun(C)
    M = cell2mat(cellfun(@(str) textscan(str, '%.0f', 'Delimiter','_'), C).').';
end

function M = func_textscan_loop(C)
    M = zeros(numel(C),nnz(C{1}=='_')+1);
    for i=1:numel(C)
        x = textscan(C{i}, '%.0f', 'Delimiter','_');
        M(i,:) = x{1};
    end
end
function M = func_str2double_strsplit_strjoin(C)
    M = reshape(str2double(strsplit(strjoin(C','_'),'_')), [], numel(C)).';
end

function M = func_str2double_strsplit_sprintf(C)
    M = strsplit(sprintf('%s_',C{:}), '_');
    M = reshape(str2double(M(1:end-1)), [], numel(C)).';
end

function M = func_str2double_strsplit(C)
    M = cellfun(@(c) strsplit(c,'_'), C, 'Uniform',false);
    M = str2double(cat(1,M{:}));
end

function M = func_str2double_strsplit_cellfun(C)
    M = cell2mat(cellfun(@(c) str2double(strsplit(c,'_')), C, 'Uniform',false));
end

function M = func_str2double_strsplit_loop(C)
    M = zeros(numel(C),nnz(C{1}=='_')+1);
    for i=1:numel(C)
        M(i,:) = str2double(strsplit(C{i},'_'));
    end
end

function M = func_str2double_regex_split_strjoin(C)
    M = reshape(str2double(regexp(strjoin(C.','_'), '_', 'split')), [], numel(C)).';
end

function M = func_str2double_regex_split_sprintf(C)
    M = regexp(sprintf('%s_',C{:}), '_', 'split');
    M = reshape(str2double(M(1:end-1)), [], numel(C)).';
end

function M = func_str2double_regex_split(C)
    M = regexp(C, '_', 'split');
    M = reshape(str2double([M{:}]), [], numel(C)).';
end

function M = func_str2double_regex_split_cellfun(C)
    M = cell2mat(cellfun(@str2double, regexp(C, '_', 'split'), 'Uniform',false));
end

function M = func_str2double_regex_split_loop(C)
    M = zeros(numel(C),nnz(C{1}=='_')+1);
    for i=1:numel(C)
        M(i,:) = str2double(regexp(C{i}, '_', 'split'));
    end
end
function M = func_str2double_regex_tokens_strjoin_1(C)
    M = reshape(cellfun(@str2double, regexp(strjoin(C.','_'), '(\d+)', 'tokens')), [], numel(C)).';
end

function M = func_str2double_regex_tokens_strjoin_2(C)
    M = regexp(strjoin(C.','_'), '(\d+)', 'tokens');
    M = reshape(str2double([M{:}]), [], numel(C)).';
end

function M = func_str2double_regex_tokens_sprintf_1(C)
    M = reshape(cellfun(@str2double, regexp(sprintf('%s_',C{:}), '(\d+)', 'tokens')), [], numel(C)).';
end

function M = func_str2double_regex_tokens_sprintf_2(C)
    M = regexp(sprintf('%s_',C{:}), '(\d+)', 'tokens');
    M = reshape(str2double([M{:}]), [], numel(C)).';
end

function M = func_str2double_regex_tokens(C)
    M = regexp(C, '(\d+)', 'tokens');
    M = cat(1,M{:});
    M = reshape(str2double([M{:}]), size(M));
end

function M = func_str2double_regex_tokens_cellfun(C)
    M = regexp(C, '(\d+)', 'tokens');
    M = cellfun(@str2double, cat(1,M{:}));
end

function M = func_str2double_regex_tokens_loop(C)
    M = zeros(numel(C),nnz(C{1}=='_')+1);
    for i=1:numel(C)
        x = regexp(C{i}, '(\d+)', 'tokens');
        M(i,:) = str2double([x{:}]);
    end
end

Most methods are variations of the same idea, only with slightly different implementations (e.g: using explicit for-loop vs. cellfun, using strjoin vs. sprintf to join a cell array of strings into one string, etc..).

Let me break it down a bit more:

  • there's the eval-based solutions. We either apply eval in a loop, or we make a single call after flattening the strings into one.

  • similarly there are solutions based on calling str2num (after replacing underscores with spaces). It's worth noting that str2num itself calls eval internally.

  • there's also the sscanf and textscan solutions. Like before, we either use them in a loop, or called on one long string.

  • another set of solutions are based on calling str2double after splitting the strings by the underscore delimiter. Again there alternative ways to split the strings (using regex with the "split" option, or using the strsplit function).

  • finally there a set of solutions based on regular expression matching.


EDIT:

I created a set of functions for benchmarking the various implementations (GitHub Gist). Here are the pre-packaged files with all solutions posted so far. I included the compiled MEX-functions (for 64-bit Windows), along with MinGW-w64 DLL dependencies.

Here are the results of running on my machine (Laptop with quad-core Intel Core i7 CPU, 8GB, Win8.1, MATLAB R2014a). I tested the following sizes: 10x10, 100x10, 1000x10, and 10000x10. As you can see, the MEX solutions do better as you increase the data size...


EDIT#2:

As requested, I updated my test results with the latest solutions; I've added the modified versions of the MEX-file by @chappjc, and the GPU versions by @Divakar. Here is the updated archive of files.

I'm using the same machine as before. I've increased the cell-array size up to 1e5. My GPU device is decent for a laptop:

>> gpuDevice
ans = 
  CUDADevice with properties:

                      Name: 'GeForce GT 630M'
                     Index: 1
         ComputeCapability: '2.1'
            SupportsDouble: 1
             DriverVersion: 5.5000
            ToolkitVersion: 5.5000
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 2.1475e+09
                FreeMemory: 2.0453e+09
       MultiprocessorCount: 2
              ClockRateKHz: 950000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

Here are the latest results:

fig1 fig2

>> t
t = 
                      func                      nrows    ncols       time   
    ________________________________________    _____    _____    __________
    'func_eval_cellfun'                            10    10       0.00044645
    'func_eval_loop'                               10    10        0.0001554
    'func_eval_sprintf'                            10    10       7.6547e-05
    'func_eval_strjoin'                            10    10       0.00056739
    'func_sscanf_cellfun'                          10    10       0.00037247
    'func_sscanf_loop'                             10    10       0.00017182
    'func_sscanf_sprintf'                          10    10       8.4928e-05
    'func_sscanf_strjoin'                          10    10       0.00056388
    'func_str2num_cellfun'                         10    10       0.00039231
    'func_str2num_loop'                            10    10       0.00033852
    'func_str2num_sprintf'                         10    10       0.00010862
    'func_str2num_strjoin'                         10    10       0.00057953
    'func_textscan_cellfun'                        10    10       0.00044585
    'func_textscan_loop'                           10    10       0.00024666
    'func_textscan_sprintf'                        10    10       9.4507e-05
    'func_textscan_strjoin'                        10    10       0.00056123
    'solution_bsxfun_bytestream_Divakar'           10    10       0.00018166
    'solution_bsxfun_bytestream_gpu_Divakar'       10    10        0.0029487
    'solution_bsxfun_cumsum_Divakar'               10    10       0.00016396
    'solution_bsxfun_sprintf_Divakar'              10    10       0.00012932
    'solution_bsxfun_sprintf_gpu_Divakar'          10    10         0.002315
    'solution_eval_loops_CSTLink'                  10    10       0.00017191
    'solution_loops_CSTLink'                       10    10       6.5514e-05
    'solution_mex_Amro'                            10    10       6.4487e-05
    'solution_mex_chappjc'                         10    10       4.2507e-05
    'solution_mex_omp_Amro'                        10    10       0.00027411
    'solution_mex_omp_chappjc'                     10    10       0.00013017
    'solution_sscanf_Divakar'                      10    10       0.00020458
    'solution_sscanf_char_LuisMendo'               10    10       0.00011144
    'solution_textscan_sprintf_chappjc'            10    10       0.00010528
    'func_eval_cellfun'                           100    10        0.0011801
    'func_eval_loop'                              100    10         0.001059
    'func_eval_sprintf'                           100    10       0.00025547
    'func_eval_strjoin'                           100    10        0.0011824
    'func_sscanf_cellfun'                         100    10        0.0023356
    'func_sscanf_loop'                            100    10        0.0012338
    'func_sscanf_sprintf'                         100    10       0.00031012
    'func_sscanf_strjoin'                         100    10        0.0011334
    'func_str2num_cellfun'                        100    10         0.002635
    'func_str2num_loop'                           100    10        0.0028056
    'func_str2num_sprintf'                        100    10       0.00027899
    'func_str2num_strjoin'                        100    10        0.0012117
    'func_textscan_cellfun'                       100    10        0.0029546
    'func_textscan_loop'                          100    10        0.0018652
    'func_textscan_sprintf'                       100    10       0.00028506
    'func_textscan_strjoin'                       100    10         0.001125
    'solution_bsxfun_bytestream_Divakar'          100    10       0.00040027
    'solution_bsxfun_bytestream_gpu_Divakar'      100    10        0.0032536
    'solution_bsxfun_cumsum_Divakar'              100    10       0.00041019
    'solution_bsxfun_sprintf_Divakar'             100    10       0.00031089
    'solution_bsxfun_sprintf_gpu_Divakar'         100    10        0.0026271
    'solution_eval_loops_CSTLink'                 100    10        0.0012294
    'solution_loops_CSTLink'                      100    10       0.00033501
    'solution_mex_Amro'                           100    10       0.00027069
    'solution_mex_chappjc'                        100    10       0.00010682
    'solution_mex_omp_Amro'                       100    10       0.00039385
    'solution_mex_omp_chappjc'                    100    10       0.00015232
    'solution_sscanf_Divakar'                     100    10        0.0010108
    'solution_sscanf_char_LuisMendo'              100    10       0.00050153
    'solution_textscan_sprintf_chappjc'           100    10       0.00026958
    'func_eval_cellfun'                          1000    10        0.0092491
    'func_eval_loop'                             1000    10         0.016145
    'func_eval_sprintf'                          1000    10         0.067573
    'func_eval_strjoin'                          1000    10         0.070024
    'func_sscanf_cellfun'                        1000    10         0.020954
    'func_sscanf_loop'                           1000    10         0.011224
    'func_sscanf_sprintf'                        1000    10        0.0022546
    'func_sscanf_strjoin'                        1000    10        0.0058568
    'func_str2num_cellfun'                       1000    10         0.024699
    'func_str2num_loop'                          1000    10          0.02645
    'func_str2num_sprintf'                       1000    10          0.05713
    'func_str2num_strjoin'                       1000    10         0.060093
    'func_textscan_cellfun'                      1000    10          0.02592
    'func_textscan_loop'                         1000    10         0.017589
    'func_textscan_sprintf'                      1000    10        0.0020249
    'func_textscan_strjoin'                      1000    10        0.0055364
    'solution_bsxfun_bytestream_Divakar'         1000    10        0.0018817
    'solution_bsxfun_bytestream_gpu_Divakar'     1000    10        0.0066003
    'solution_bsxfun_cumsum_Divakar'             1000    10         0.001982
    'solution_bsxfun_sprintf_Divakar'            1000    10        0.0015578
    'solution_bsxfun_sprintf_gpu_Divakar'        1000    10        0.0046952
    'solution_eval_loops_CSTLink'                1000    10         0.011481
    'solution_loops_CSTLink'                     1000    10        0.0027254
    'solution_mex_Amro'                          1000    10        0.0022698
    'solution_mex_chappjc'                       1000    10        0.0006967
    'solution_mex_omp_Amro'                      1000    10        0.0015025
    'solution_mex_omp_chappjc'                   1000    10       0.00041463
    'solution_sscanf_Divakar'                    1000    10        0.0093785
    'solution_sscanf_char_LuisMendo'             1000    10        0.0038031
    'solution_textscan_sprintf_chappjc'          1000    10        0.0020323
    'func_eval_cellfun'                         10000    10         0.083676
    'func_eval_loop'                            10000    10         0.098798
    'func_eval_sprintf'                         10000    10          0.60429
    'func_eval_strjoin'                         10000    10          0.63656
    'func_sscanf_cellfun'                       10000    10          0.20675
    'func_sscanf_loop'                          10000    10           0.1088
    'func_sscanf_sprintf'                       10000    10         0.021725
    'func_sscanf_strjoin'                       10000    10         0.052341
    'func_str2num_cellfun'                      10000    10          0.24192
    'func_str2num_loop'                         10000    10          0.26538
    'func_str2num_sprintf'                      10000    10          0.53451
    'func_str2num_strjoin'                      10000    10          0.56759
    'func_textscan_cellfun'                     10000    10          0.25474
    'func_textscan_loop'                        10000    10          0.17402
    'func_textscan_sprintf'                     10000    10         0.018799
    'func_textscan_strjoin'                     10000    10          0.04965
    'solution_bsxfun_bytestream_Divakar'        10000    10         0.019165
    'solution_bsxfun_bytestream_gpu_Divakar'    10000    10         0.031283
    'solution_bsxfun_cumsum_Divakar'            10000    10         0.027986
    'solution_bsxfun_sprintf_Divakar'           10000    10         0.017761
    'solution_bsxfun_sprintf_gpu_Divakar'       10000    10         0.024821
    'solution_eval_loops_CSTLink'               10000    10          0.10885
    'solution_loops_CSTLink'                    10000    10         0.025136
    'solution_mex_Amro'                         10000    10         0.021374
    'solution_mex_chappjc'                      10000    10        0.0060774
    'solution_mex_omp_Amro'                     10000    10        0.0076461
    'solution_mex_omp_chappjc'                  10000    10         0.002058
    'solution_sscanf_Divakar'                   10000    10          0.10503
    'solution_sscanf_char_LuisMendo'            10000    10         0.035483
    'solution_textscan_sprintf_chappjc'         10000    10         0.018772
    'func_eval_cellfun'                         1e+05    10          0.85115
    'func_eval_loop'                            1e+05    10          0.97977
    'func_eval_sprintf'                         1e+05    10           6.2422
    'func_eval_strjoin'                         1e+05    10           6.5012
    'func_sscanf_cellfun'                       1e+05    10           2.0761
    'func_sscanf_loop'                          1e+05    10           1.0865
    'func_sscanf_sprintf'                       1e+05    10          0.22618
    'func_sscanf_strjoin'                       1e+05    10          0.53146
    'func_str2num_cellfun'                      1e+05    10           2.4041
    'func_str2num_loop'                         1e+05    10           2.6431
    'func_str2num_sprintf'                      1e+05    10              5.4
    'func_str2num_strjoin'                      1e+05    10           5.6967
    'func_textscan_cellfun'                     1e+05    10           2.5696
    'func_textscan_loop'                        1e+05    10           1.7175
    'func_textscan_sprintf'                     1e+05    10          0.19759
    'func_textscan_strjoin'                     1e+05    10          0.50314
    'solution_bsxfun_bytestream_Divakar'        1e+05    10          0.21884
    'solution_bsxfun_bytestream_gpu_Divakar'    1e+05    10          0.23607
    'solution_bsxfun_cumsum_Divakar'            1e+05    10          0.29511
    'solution_bsxfun_sprintf_Divakar'           1e+05    10          0.19882
    'solution_bsxfun_sprintf_gpu_Divakar'       1e+05    10          0.17923
    'solution_eval_loops_CSTLink'               1e+05    10           1.0943
    'solution_loops_CSTLink'                    1e+05    10           0.2534
    'solution_mex_Amro'                         1e+05    10          0.21575
    'solution_mex_chappjc'                      1e+05    10         0.060666
    'solution_mex_omp_Amro'                     1e+05    10         0.072168
    'solution_mex_omp_chappjc'                  1e+05    10         0.024385
    'solution_sscanf_Divakar'                   1e+05    10           1.0992
    'solution_sscanf_char_LuisMendo'            1e+05    10          0.36688
    'solution_textscan_sprintf_chappjc'         1e+05    10          0.19755

Upvotes: 7

Divakar
Divakar

Reputation: 221504

Benchmarking

Few things must be noted on the benchmark codes used for this benchmarking. This is same as Amro's benchmark codes with these changes -

  • The division by 1000 was added to all solutions to keep the consistency among solutions.
  • The error checking for validity is removed. It won't affect the runtime results anyway.
  • The datasize (number of rows) is extended to 1000000 and for that I had to keep the count of numbers in each cell as 8 to accommodate everything into memory. Also, I think this would present a fair comparison between vectorized and loopy solutions.
  • The mex based solutions are avoided, but I would love to see runtime comparisons against those too, if someone would be too kind to include the solutions included in this benchmark with those mex based ones.
  • For benchmarking GPU codes, gputimeit was used, instead of timeit.

The codes are packaged and available here. In it, use bench_script.m as the main function.

Benchmark Results

enter image description here

enter image description here

           func               nrows    ncols      time  
__________________________    _____    _____    ________

'all_loops'                   10000    8        0.026996
'char_and_sscanf_solution'    10000    8        0.034492
'eval_and_loops_solution'     10000    8         0.22548
'extIntsSprintfTextScan'      10000    8        0.036889
'func_eval_cellfun'           10000    8         0.16483
'func_eval_loop'              10000    8          0.1845
'func_sscanf_cellfun'         10000    8         0.40217
'func_sscanf_loop'            10000    8         0.19508
'func_sscanf_sprintf'         10000    8        0.027505
'func_sscanf_strjoin'         10000    8         0.11128
'func_str2num_cellfun'        10000    8          0.4976
'func_str2num_loop'           10000    8          0.5303
'func_textscan_cellfun'       10000    8           0.547
'func_textscan_loop'          10000    8          0.3752
'func_textscan_sprintf'       10000    8        0.036699
'func_textscan_strjoin'       10000    8         0.12059
'single_sscanf_solution'      10000    8         0.17122
'soln_bytstrm_bsxfun_gpu'     10000    8        0.023365
'soln_sprintf_bsxfun_gpu'     10000    8        0.019986
'solution_bytstrm_bsxfun'     10000    8        0.031165
'solution_cumsum_bsxfun'      10000    8        0.047445
'solution_sprintf_bsxfun'     10000    8        0.028417
'all_loops'                   50000    8         0.13444
'char_and_sscanf_solution'    50000    8          0.1753
'eval_and_loops_solution'     50000    8          1.1242
'extIntsSprintfTextScan'      50000    8          0.1871
'func_eval_cellfun'           50000    8         0.82261
'func_eval_loop'              50000    8         0.91632
'func_sscanf_cellfun'         50000    8          2.0088
'func_sscanf_loop'            50000    8         0.97656
'func_sscanf_sprintf'         50000    8         0.13891
'func_sscanf_strjoin'         50000    8         0.56368
'func_str2num_cellfun'        50000    8          2.4786
'func_str2num_loop'           50000    8          2.6377
'func_textscan_cellfun'       50000    8          2.7452
'func_textscan_loop'          50000    8          1.8249
'func_textscan_sprintf'       50000    8         0.18556
'func_textscan_strjoin'       50000    8         0.60935
'single_sscanf_solution'      50000    8         0.90871
'soln_bytstrm_bsxfun_gpu'     50000    8         0.10591
'soln_sprintf_bsxfun_gpu'     50000    8        0.079611
'solution_bytstrm_bsxfun'     50000    8         0.18875
'solution_cumsum_bsxfun'      50000    8         0.27233
'solution_sprintf_bsxfun'     50000    8         0.16467
'all_loops'                   80000    8         0.21602
'char_and_sscanf_solution'    80000    8         0.27855
'eval_and_loops_solution'     80000    8          1.7997
'extIntsSprintfTextScan'      80000    8         0.29733
'func_eval_cellfun'           80000    8          1.3171
'func_eval_loop'              80000    8          1.4647
'func_sscanf_cellfun'         80000    8          3.2232
'func_sscanf_loop'            80000    8          1.5664
'func_sscanf_sprintf'         80000    8         0.22136
'func_sscanf_strjoin'         80000    8         0.89605
'func_str2num_cellfun'        80000    8          3.9688
'func_str2num_loop'           80000    8          4.2199
'func_textscan_cellfun'       80000    8          4.3841
'func_textscan_loop'          80000    8          2.9181
'func_textscan_sprintf'       80000    8         0.29494
'func_textscan_strjoin'       80000    8         0.97383
'single_sscanf_solution'      80000    8          1.4542
'soln_bytstrm_bsxfun_gpu'     80000    8         0.15528
'soln_sprintf_bsxfun_gpu'     80000    8         0.11911
'solution_bytstrm_bsxfun'     80000    8         0.28552
'solution_cumsum_bsxfun'      80000    8         0.43238
'solution_sprintf_bsxfun'     80000    8         0.24801
'all_loops'                   1e+05    8         0.26833
'char_and_sscanf_solution'    1e+05    8         0.34617
'eval_and_loops_solution'     1e+05    8          2.2465
'extIntsSprintfTextScan'      1e+05    8         0.37322
'func_eval_cellfun'           1e+05    8           1.641
'func_eval_loop'              1e+05    8          1.8339
'func_sscanf_cellfun'         1e+05    8           4.024
'func_sscanf_loop'            1e+05    8          1.9598
'func_sscanf_sprintf'         1e+05    8         0.27558
'func_sscanf_strjoin'         1e+05    8          1.1193
'func_str2num_cellfun'        1e+05    8          4.9592
'func_str2num_loop'           1e+05    8          5.2634
'func_textscan_cellfun'       1e+05    8          5.6636
'func_textscan_loop'          1e+05    8           3.981
'func_textscan_sprintf'       1e+05    8         0.36947
'func_textscan_strjoin'       1e+05    8           1.208
'single_sscanf_solution'      1e+05    8          1.8665
'soln_bytstrm_bsxfun_gpu'     1e+05    8         0.19389
'soln_sprintf_bsxfun_gpu'     1e+05    8         0.14588
'solution_bytstrm_bsxfun'     1e+05    8         0.35404
'solution_cumsum_bsxfun'      1e+05    8         0.54906
'solution_sprintf_bsxfun'     1e+05    8         0.30324

System Config

MATLAB Version: 8.3.0.532 (R2014a)
Operating System: Ubuntu 14.04 LTS 64-bit
RAM: 4GB
CPU Model: IntelĀ® PentiumĀ® Processor E5400 (2M Cache, 2.70 GHz)
GPU Model: GTX 750Ti 2GB

configinfo.m output (important ones only) -

MATLAB accelerator enabled
MATLAB JIT: enabled
MATLAB assertions: disabled
MATLAB Desktop: enabled
Java JVM: enabled
CPU: x86 Family 6 Model 23 Stepping 10, GenuineIntel
Number of processors: 2
CPU speed is: 2700 MHz
RAM: 4046992 kB
Swap space: 9760764 kB
Number of cores: 2
Number of threads: 2

Upvotes: 2

Divakar
Divakar

Reputation: 221504

Approach #1

This first approach in a broad sense has two parts. The first one gets us one big long string that has all the characters from the cell array with underscores separating two cells, which is essentially implemented with cumsum. The second one based on bsxfun gets us the desired numeric output in the desired format.

The code would look something like this -

function out = solution_cumsum_bsxfun(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
tlens = sum(lens); %// Total number of characters [total of lengths]
cumlens = cumsum(lens); %// Cumulative lengths

%// Create an array of ones and zeros. 
%// The length of this array would be equal to sum of characters in all cells. 
%// The ones would be at the positions where new cells start, zeros otherwise
startpos_newcell(1,tlens)=0;
startpos_newcell(cumlens(1:end-1)+1)=1;

%// Calculate new positions for all characters in the cell array with 
%// places/positions left between two cells for putting underscores
newpos = (1:tlens) + cumsum(startpos_newcell);

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells
one_str(newpos) = [list_of_words{:}];
one_str(cumlens + [1:numel(cumlens)]') = '_'; %//'#

pos_us = find(one_str=='_'); %// positions of underscores
wordend_idx = pos_us - [1:numel(pos_us)]; %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create mask where digit characters are to be placed
mask = bsxfun(@ge,[1:max_wordlens]',[max_wordlens+1-wordlens]); %//'#

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1(max_wordlens,size(mask,2)) = 0;
num1(mask) = one_str(one_str~='_') - 48;

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
out = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#

return;

Approach #2

This a slight variation of approach #1 in that it does the job of first part with sprintf. Now, the idea came to me after seeing it in action in @chappjc's solution. I appreciate him on letting me use it in this modified version.

Here's the code -

function out = solution_sprintf_bsxfun(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells
one_str = sprintf('%s_',list_of_words{:});

pos_us = find(one_str=='_'); %// positions of underscores
wordend_idx = pos_us - [1:numel(pos_us)]; %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create mask where digit characters are to be placed
mask = bsxfun(@ge,[1:max_wordlens]',[max_wordlens+1-wordlens]); %//'#

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1(max_wordlens,size(mask,2)) = 0;
num1(mask) = one_str(one_str~='_') - 48;

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
out = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#

return;

Approach #3

This is an altogether new approach based on getByteStreamFromArray -

function out = solution_bytstrm_bsxfun(list_of_words)

allchars = [list_of_words{:}];
digits_array = getByteStreamFromArray(allchars);

%// At my 32-bit system getByteStreamFromArray gets the significant digits 
%// from index 65 onwards. Thus, we need to crop/index it accordingly
digits_array = digits_array(65:65+numel(allchars)-1) - 48;
% --------------------------------------------------------------------
N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
cumlens = cumsum(lens)'; %//'# Cumulative lengths
% ----------------------------------------------------------
pos_us = find(digits_array==47);
starts = [[1 cumlens(1:end-1)+1];reshape(pos_us+1,N-1,[])];
ends = [reshape(pos_us-1,N-1,[]) ; cumlens];
gaps = ends(:) - starts(:) + 1;

maxg = max(gaps);
mask1 = bsxfun(@lt,maxg-gaps',[1:maxg]');

num_array(maxg,numel(lens)*N)=0;
num_array(mask1) = digits_array(digits_array~=47);
out = reshape(10.^(maxg-4:-1:-3)*num_array,N,[]).';

return;

GPU versions of the above approaches are listed next.

Approach #4

function out = soln_sprintf_bsxfun_gpu(list_of_words)

N = 1 + sum(list_of_words{1}=='_'); %// Number of "words" per cell

%// Create one long string that has all the characters from the cell arary
%// with underscores separating cells. Now this one uses sprintf as
%// firstly proposed in @chappjc's solution and he has agreed to let me use
%// it, so appreciating his help on this.
digits_array = gpuArray(single(sprintf('%s_',list_of_words{:})));
digits_array = digits_array - 48;

mask_us = digits_array==47; %// mask of underscores
pos_us = find(mask_us);

wordend_idx = pos_us - gpuArray.colon(1,numel(pos_us)); %// word end indices
wordlens = [wordend_idx(1) diff(wordend_idx)]; %// word lengths
max_wordlens = max(wordlens); %// maximum word length

%// Create a numeric array with columns for each word and each row holding a digit.
%// The most significant digits to least significant going from top to bottom
num1 =  single(zeros(max_wordlens,numel(pos_us),'gpuArray')); %//'
num1(bsxfun(@ge,gpuArray.colon(1,max_wordlens)',...
    max_wordlens+1-single(wordlens))) = digits_array(~mask_us); %//'

%// Finally get the desired output converting each column to a single
%// number, reshaping to desired size and scaling down by a factor of 1000
outg = reshape(10.^(max_wordlens-4:-1:-3)*num1,N,[]).'; %//'#
out = gather(outg);

return;

Approach #5

function out = soln_bytstrm_bsxfun_gpu(list_of_words)

us_ascii_num = 95;
allchars = [list_of_words{:}];

%// At my 32-bit system getByteStreamFromArray gets the significant digits 
%// from index 65 onwards. Thus, we need to crop/index it accordingly
alldigits = getByteStreamFromArray(allchars);
digits_array1 = gpuArray(alldigits(65:65+numel(allchars)-1));
% --------------------------------------------------------------------
lens = cellfun('length',list_of_words); %// No. of chars [lengths] in each cell
N = sum(digits_array1(1:lens(1))==us_ascii_num)+1; %// Number of "words" per cell
lens = gpuArray(lens);
cumlens = cumsum(lens)'; %//'# Cumulative lengths
% ----------------------------------------------------------
mask_us = digits_array1==us_ascii_num; %// mask of underscores
pos_us = find(mask_us);
starts = [[1 cumlens(1:end-1)+1];reshape(pos_us+1,N-1,[])];
ends = [reshape(pos_us-1,N-1,[]) ; cumlens];
gaps = ends(:) - starts(:) + 1;

maxg = max(gaps);
mask1 = bsxfun(@lt,maxg-gaps',[1:maxg]');

num_array = zeros(maxg,numel(lens)*N,'gpuArray'); %//'
num_array(mask1) = digits_array1(~mask_us)-48;
out = reshape(10.^(maxg-4:-1:-3)*num_array,N,[]).'; %//'
out = gather(out);

return;

Upvotes: 5

chappjc
chappjc

Reputation: 30579

I'd apply sprintf with a comma-separated list input to generate a single, fully-delimited 1D string (with delimiter at end of each line/string), and then use textscan to pull out the integers:

extractIntsSprintfTextScan.m

function M = extractIntsSprintfTextScan(list_of_words)

% optimized lines, see below for explanation of each op
M = textscan(sprintf('%s_',C{:}),'%.0f', 'Delimiter','_');
M = reshape(M{1}, [], numel(C)).';

% cell to string, adding _ suffix
% s = sprintf('%s_',list_of_words{:});

% extract integers given _ as delimiter
% C = textscan(s,'%u','Delimiter','_','CollectOutput',1); % can also use %d for signed
% C = textscan(s,'%.0f_','CollectOutput',1); % a little faster to put _ in the pattern

% matrix form
% M = reshape(C{1},[],numel(list_of_words)).'/1000; % remove transpose for faster output

end

Benchmarks (previous timings now stale, see sprintf+textscan solution in Amro's benchmark post instead)

Machine

MATLAB R2014b 64-bit
Windows 7 64-bit
24 GB RAM
Dual Xeon X5550 (2.67GHZ, 8 physical cores)

UPDATE: Change output type to double from an integer type. Re-run benchmarks.
UPDATE 2: Change format specifier from '%f' to '%.0f', maybe allowing it to speed up the scan since no decimals are requested. Increase N = 100e3; (see the GitHub Gist).
UPDATE 3: New benchmarks with version 2 of CST-Link's timing functions (see new GitHub Gist with suggested usage of referee_timing.m).
UPDATE 4: Add Luis' solution. Note that the results fluctuate since the data is randomly generated. UPDATE 5: Optimization tweaks -- see Amro's benchmarking post.

Upvotes: 3

user2271770
user2271770

Reputation:

(Most recent Proposal)

This kind of eliminates the last MATLAB-specific part of my anti-MATLAB solution; pure byte-scanning, loop-in-loop, mem-allocating stuff:

%'all_loops.m'
function array_of_numbers = all_loops(list_of_words)

        %'Precalculate important numbers'
        n_numbers = 1 + sum(list_of_words{1}=='_');
        n_words   = numel(list_of_words);

        %'Allocate memory'
        array_of_numbers = zeros(n_numbers, n_words);
        slice_of_numbers = zeros(n_numbers, 1);

        %'Loop trough chunks of cell array'
        for k = 1:n_words
                str     = list_of_words{k};
                pos_max = size(str, 2);
                value   = 0;
                index   = 1;

                for pos = 1:pos_max
                        if str(pos) == '_'
                                slice_of_numbers(index) = value;
                                value                   = 0;
                                index                   = index + 1;
                        else
                                value = 10*value + (str(pos) - '0');
                        end;
                        slice_of_numbers(index) = value; %'almost forgot'
                end;

                array_of_numbers(:, k) = slice_of_numbers;
        end;

         %'This is a waste of memory, but is kind of required'
        array_of_numbers = transpose(array_of_numbers / 1000);

end

(Benchmarking) For the following configuration (obtained from configinfo.m)

MATLAB configuration information                    CPU: x86 Family 6 Model 58 Stepping 9, GenuineIntel
This data was gathered on: 24-Oct-2014 14:19:31     The measured CPU speed is: 2600 MHz
MATLAB version: 7.14.0.739 (R2012a)                 Number of processors: 4
MATLAB root: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx       RAM: 3289 MB
MATLAB accelerator enabled                          Swap space: 6576 MB
MATLAB JIT: enabled                                 Microsoft Windows 7
MATLAB assertions: disabled                         Number of cores: 2
MATLAB Desktop: enabled                             Number of threads: 2

the following results were obtained (3rd run):

Generating 10000-words test case...
Timing 1000000-words test case...
  approach4: Error - Out of memory. Type HELP MEMORY for your options.
  approach1: Error - Out of memory. Type HELP MEMORY for your options.
  single_sscanf_solution: Error - Out of memory. Type HELP MEMORY for your options.
  char_and_sscanf_solution: 5.076296[s]
  extractIntsSprintfTextScan: 4.328066[s]
  all_loops: 1.795730[s]
  eval_and_loops_solution: 10.027541[s]
Generating 100000-words test case...
Timing 100000-words test case...
  approach4: 0.252107[s]
  approach1: 0.370727[s]
  single_sscanf_solution: 1.364936[s]
  char_and_sscanf_solution: 0.515599[s]
  extractIntsSprintfTextScan: 0.444586[s]
  all_loops: 0.179575[s]
  eval_and_loops_solution: 1.010240[s]
Generating 10000-words test case...
Timing 10000-words test case...
  approach4: 0.026642[s]
  approach1: 0.039550[s]
  single_sscanf_solution: 0.136711[s]
  char_and_sscanf_solution: 0.049708[s]
  extractIntsSprintfTextScan: 0.042608[s]
  all_loops: 0.017636[s]
  eval_and_loops_solution: 0.099111[s]

You may notice a difference in the last step, between "Generating 10000..." and "Timing 1000000..."; in order to decimate the occupied memory (otherwise everything would fail on generating the test case) i used repmat(referee_test_case(1e4), 100, 1) instead of referee_test_case(1e6).

Upvotes: 2

Amro
Amro

Reputation: 124543

Challenge accepted! Here is my fastest implementation so far, and yes it's a C++ MEX-file :)

solution_mex.cpp

#include "mex.h"
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

namespace {

// get i-th string from cell-array of strings
std::string getString(const mxArray *cellstr, mwIndex idx)
{
    mxArray *arr = mxGetCell(cellstr, idx);
    if (arr == NULL) mexErrMsgIdAndTxt("mex:err", "null/uninitialized");
    if (!mxIsChar(arr)) mexErrMsgIdAndTxt("mex:err", "not a string");
    char *cstr = mxArrayToString(arr);
    if (cstr == NULL) mexErrMsgIdAndTxt("mex:err", "null");
    std::string str(cstr);
    mxFree(cstr);
    return str;
}

// count of numbers in char-delimited string
mwSize count_numbers(const std::string &s)
{
    return std::count(s.begin(), s.end(), '_') + 1;
}

// parse numbers
template <typename T>
void parseNumbers(const mxArray *cellstr, const mwSize len, std::vector<T> &v)
{
    // cell-array of strings
    std::vector<std::string> vs;
    vs.reserve(len);
    for (mwIndex idx=0; idx<len; ++idx) {
        vs.push_back(getString(cellstr, idx));
    }

    // join vector of strings into one
    std::stringstream ss;
    std::copy(vs.begin(), vs.end(),
        std::ostream_iterator<std::string>(ss, "_"));

    // split string into numbers (separated by single-char delimiter)
    T num;
    while (ss >> num) {
        v.push_back(num);
        ss.ignore(1);
    }
}

};

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    // validate inputs
    if (nrhs!=1 || nlhs>1) mexErrMsgIdAndTxt("mex:err", "wrong num args");
    if (!mxIsCell(prhs[0])) mexErrMsgIdAndTxt("mex:err", "not cell");

    // allocate output matrix
    mwSize len = mxGetNumberOfElements(prhs[0]);
    mwSize sz = (len > 0) ? count_numbers(getString(prhs[0],0)) : 0;
    plhs[0] = mxCreateNumericMatrix(sz, len, mxDOUBLE_CLASS, mxREAL);
    if (plhs[0] == NULL) mexErrMsgIdAndTxt("mex:err", "null");
    if (len == 0 || sz == 0) return;

    // parse cell array into numbers
    std::vector<int> v;
    v.reserve(len*sz);
    parseNumbers(prhs[0], len, v);
    if (v.size() != (len*sz)) mexErrMsgIdAndTxt("mex:err", "wrong size");

    // copy numbers into output matrix
    std::copy(v.begin(), v.end(), mxGetPr(plhs[0]));
}

The code is straightforward to read; I'm using the standard STL library where possible (no dirty tricks), plus there's plenty of checks and input validation, so it should robust as long you follow the input format described in the question.

I'll update with some benchmarks later, but for now you can test it on your own and compare against the other solutions...

Note the above MEX-function returns a matrix of size n_numbers * n_words, so you'll want to transpose the result.

Here is a wrapper M-function you can use to run under the referee program:

solution_mex.m

function array_of_numbers = mex_solution(list_of_words)
    array_of_numbers = solution_mex(list_of_words).';
end

EDIT#1

Let's simplify the code a bit; this version uses much less memory by processing one string at a time, and placing the result directly into the output matrix:

solution_mex.cpp

#include "mex.h"
#include <string>
#include <sstream>
#include <algorithm>

namespace {
std::string getString(const mxArray *cellstr, const mwIndex idx)
{
    // get i-th string from cell-array of strings
    mxArray *arr = mxGetCell(cellstr, idx);
    if (!arr || !mxIsChar(arr)) mexErrMsgIdAndTxt("mex:err", "not a string");
    char *cstr = mxArrayToString(arr);
    if (cstr == NULL) mexErrMsgIdAndTxt("mex:err", "null");
    std::string str(cstr);
    mxFree(cstr);
    return str;
}

mwSize count_numbers(const std::string &s)
{
    // count of numbers in char-delimited string
    return std::count(s.begin(), s.end(), '_') + 1;
}
};

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    // validate inputs
    if (nrhs!=1 || nlhs>1) mexErrMsgIdAndTxt("mex:err", "wrong num args");
    if (!mxIsCell(prhs[0])) mexErrMsgIdAndTxt("mex:err", "not a cell");

    // determine sizes
    const mxArray *cellstr = prhs[0];
    const mwSize n_words = mxGetNumberOfElements(cellstr);
    const mwSize n_nums = (n_words > 0) ?
        count_numbers(getString(cellstr,0)) : 0;

    // allocate output matrix
    plhs[0] = mxCreateDoubleMatrix(n_nums, n_words, mxREAL);
    if (plhs[0] == NULL) mexErrMsgIdAndTxt("mex:err", "null");
    if (n_words == 0 || n_nums == 0) return;
    double *out = mxGetPr(plhs[0]);

    // extract numbers from strings
    for (mwIndex idx=0, i=0; idx<n_words; ++idx) {
        std::istringstream ss(getString(cellstr, idx));
        int num;
        while(ss >> num) {
            out[i++] = num;
            ss.ignore(1);
        }
    }
}

EDIT#2 (OpenMP)

Next step, let's make it multi-threaded; We can do this implicitly using OpenMP by just adding two lines of code! I am parallelizing over each string.

First we add the omp parallel for pragma, then we make the index variable i private to each thread, that way threads know the column starting index.

In the previous code from edit#1, just replace the last loop with:

// extract numbers from strings
#pragma omp parallel for
for (mwIndex idx=0; idx<n_words; ++idx) {
    mwIndex i = idx*n_nums;  // starting index for i-th column
    std::istringstream ss(getString(cellstr, idx));
    int num;
    while(ss >> num) {
        out[i++] = num;
        ss.ignore(1);
    }
}

I'm running R2014a on Windows x64, and I tried compiling with both VS2013 and MinGW-w64 GCC 4.9.1. Let me point out that the GCC-compiled version is way faster than all solutions so far:

% compile with MinGW-w64
>> mex -largeArrayDims -f mingwopts.bat solution_mex.cpp -output solution_mex_gcc

% set appropriate number of threads
>> setenv('OMP_NUM_THREADS','4');

% quick benchmark
>> timeit(@() solution_mex_gcc(repmat({'02_04_04_52_23_14_54_672_0'},1e6,1)).')
ans =
    0.6658

Upvotes: 2

Luis Mendo
Luis Mendo

Reputation: 112659

The following solution seems to work:

function array_of_numbers = char_and_sscanf_solution(list_of_words)

s = char(list_of_words);
s(s==95) = 32; %// 95 is '_'; 32 is ' '
s = [ s repmat(32, numel(list_of_words), 1) ];
array_of_numbers = reshape(sscanf(s.','%i '), [], numel(list_of_words)).'/1000;

Benchmarking results using referee_timing.m and referee_test_case.m:

Generating 10000-words test case...
Timing 10000-words test case...
  eval_and_loops_solution: 0.190642[s]
  single_sscanf_solution: 0.234413[s]
  approach1: 0.073901[s]
  char_and_sscanf_solution: 0.068311[s]

Generating 100000-words test case...
Timing 100000-words test case...
  eval_and_loops_solution: 1.957728[s]
  single_sscanf_solution: 2.426764[s]
  approach1: 0.766020[s]
  char_and_sscanf_solution: 0.706387[s]

Generating 1000000-words test case...
Timing 1000000-words test case...
  eval_and_loops_solution: 18.975746[s]
  char_and_sscanf_solution: 7.147229[s]

Computer information:

Matlab R2010b
Windows 7 Home Premium 64 bits Service Pack 1
Pentium(R)Dual Core CPU T4300 2.10 GHz
4 GB RAM

Upvotes: 3

Related Questions