Søren J
Søren J

Reputation: 31

Faster num2str for integers?

I have an array called skj. skj contain 2 million rows with numbers (2000000x1 uint32).

I want to compute the following

string_skj = num2str(skj);

When I run the above line it takes about 1 minute, is there a faster way of doing it?

Upvotes: 3

Views: 2649

Answers (5)

Luis Mendo
Luis Mendo

Reputation: 112699

The following is much faster on my machine:

y = dec2base(skj,10);

Here's a quick test:

>> skj = uint32(2^32*rand(1e6,1)); %// random data

>> tic, y = num2str(skj); toc
Elapsed time is 22.823348 seconds.

>> tic, z = dec2base(skj,10); toc
Elapsed time is 1.235942 seconds.

Note that using dec2base gives leading zeros instead of leading spaces.

>> y(1:5,:)
ans =
3864067979
1572155259
1067755677
2492696731
 561648530

>> z(1:5,:)
ans =
3864067979
1572155259
1067755677
2492696731
0561648530

Upvotes: 4

Daniel
Daniel

Reputation: 36710

Implementing int2str yourself, you can beat the performance of the original function by far.

function [ o ] = myfastint2str( x )
maxvalue=max(x(:));
%maxvalue=intmax(class(x));%Alternative implementation based on class
required_digits=ceil(log(double(maxvalue+1))/log(10));
o=repmat(x(1)*0,size(x,1),required_digits);%initialize array of required size
for c=size(o,2):-1:1
   o(:,c)=mod(x,10);
   x=(x-o(:,c))/10;
end
o=char(o+'0');
end

For the example input, my function required less than 0.15 seconds, while both int2str and num2str took about 15 seconds. The output is slightly different as it generates leading zeros instead of blanks.

Upvotes: 4

Hennadii Madan
Hennadii Madan

Reputation: 1623

Warning: the output is wrong, but may be workable.

Edit: A super fast 'solution' output is not a column, but a string with line breaks as separators. If you try to print it it will look the same

>> tic;a = sprintf('%d\n',skj);toc
Elapsed time is 0.422143 seconds

Edit: Old 'solution'

Try transposing before and after. Like num2str(skj.').'

>> skj = ones(2000000,1,'uint32');
>> tic;num2str(skj);toc
Elapsed time is 23.305860 seconds.
>> tic;num2str(skj.');toc
Elapsed time is 1.044551 seconds.

Upvotes: 0

IKavanagh
IKavanagh

Reputation: 6187

Hennadii Madan's answer got me thinking if there was a way to do this for column vectors more efficiently than the standard Matlab num2str (or int2str) and I've come up with 2 solutions that do.

EDIT: And after all that work @Luis Mendo comes in and blows it all out of the water :'(

EDIT: Now @Daniel has improved on all of the previous options again!


Given our row vector, V, as

V = uint32(randi(100, 200000, 1));

we can achieve the same result as

A = num2str(V);

with *

B = char(strsplit(num2str(V.')).');

or without the error checking of num2str

C = char(strsplit(sprintf('%d\n', V)).');
C = C(1:end-1, :); % Remove extraneous '\n'

B and C are slightly different to A. num2str pre-pads with a space, ' ', whilst B and C post-pad with a space.

In the below D and E are pre-padded with 0's and so do not match A, B or C exactly.


Benchmarks

-----num2str() on row vector [Original]-----
Elapsed time is 3.501976 seconds.
  Name           Size              Bytes  Class    Attributes

  A         200000x3             1200000  char               

-----num2str() on column vector [IKavanagh modified from Hennadii Madan]-----
Elapsed time is 0.660878 seconds.
  Name           Size              Bytes  Class    Attributes

  B         200000x3             1200000  char               

-----sprintf() on row vector [IKavanagh]-----
Elapsed time is 0.582472 seconds.
  Name           Size              Bytes  Class    Attributes

  C         200000x3             1200000  char               

-----dec2base() on row vector [Luis Mendo]-----
Elapsed time is 0.042563 seconds.
  Name           Size              Bytes  Class    Attributes

  D         200000x3             1200000  char



-----myfastint2str() on row vector [Daniel]-----
Elapsed time is 0.011894 seconds.
  Name           Size              Bytes  Class    Attributes

  E         200000x3             1200000  char 

Code

clear all
close all
clc

V = uint32(randi(100, 200000, 1));

for k = 1:50000
    tic(); elapsed = toc(); % Warm up tic/toc
end

disp('-----num2str() on row vector [Original]-----');
tic;
A = num2str(V);
toc, whos A

disp('-----num2str() on column vector [IKavanagh modified from Hennadii Madan]-----');
tic;
B = char(strsplit(num2str(V.')).');
toc, whos B

disp('-----sprintf() on row vector [IKavanagh]-----');
tic;
C = char(strsplit(sprintf('%d\n', V)).');
C = C(1:end-1, :); % Remove extraneous '\n'
toc, whos C

disp('-----dec2base() on row vector [Luis Mendo]-----');
tic;
D = dec2base(V, 10);
toc, whos D

disp('-----myfastint2str() on row vector [Daniel]-----');
tic;
E = myfastint2str(V);
toc, whos E

Upvotes: 5

gariepy
gariepy

Reputation: 3674

If you really need to increase speed, have you considered writing a MEx function extension in C? It's a little bit complicated, but it's worth investing the time if you have some small routines that can easily be coded in C/C++. Once compiled, the MEx function can be called from the MATLAB command prompt, just like a .m function.

See http://www.mathworks.com/help/matlab/call-mex-files-1.html for more details.

Upvotes: 1

Related Questions