Sara Savio
Sara Savio

Reputation: 59

Fast splitting string

I have a series of lines that I read from a file (over 2700) of this type:

A = '1; 23245675; -234567; 123456; ...; 0'

A is a string with ; as the delimiter for data.

To split the string I used the strsplit function first, but it was too slow to execute. Then I used regexp like this:

regexp(A,';','split')

Is there an even faster function than regexp?

Upvotes: 1

Views: 1505

Answers (2)

EBH
EBH

Reputation: 10440

Being a builtin function1, textscan is probably the fastest option:

result = textscan(A{1},'%f','Delimiter',';');

Here is a little benchmark to show that:

A = repmat('1; 23245675; -234567; 123456; 0',1,100000); % a long string
regexp_time = timeit(@ () regexp(A,';','split'))
strsplit_time = timeit(@ () strsplit(A,';'))
split_time = timeit(@ () split(A,';'))
textscan_time = timeit(@ () textscan(A,'%f','Delimiter',';'))

the result:

regexp_time =
      0.33054
strsplit_time =
      0.45939
split_time =
      0.24722
textscan_time =
     0.057712

textscan is the fastest, and is ~4.3 times faster than the next method (split).

It is the fastest option no matter what is the length of the string to split (Note the log scale of the x-axis):

benchmark of string splitting


1"A built-in function is part of the MATLAB executable. MATLAB does not implement these functions in the MATLAB language. Although most built-in functions have a .m file associated with them, this file only supplies documentation for the function." (from the documentation)

Upvotes: 2

Banghua Zhao
Banghua Zhao

Reputation: 1556

The possible split function I can think about are regexp, strsplit, and split.

I compared the performance of them for a large string. The result shows split is slightly faster while strsplit is around 2 times slower than regexp.

Here is how I compared them:

First, create a large string A (around 16 million data) according to your question.

A = '1; 23245675; -234567; 123456; 0';
for ii=1:22
    A = strcat(A,A);
end

Option 1: regexp

tic
regexp(A,';','split');
toc

Elapsed time is 12.548295 seconds.

Option 2: strsplit

tic
strsplit(A,';');
toc

Elapsed time is 23.347392 seconds.

Option 3: split

tic
split(A,';');
toc

Elapsed time is 9.678433 seconds.

So split might help you speed up a little bit but it is not obvious.

Upvotes: 0

Related Questions