BallzofFury
BallzofFury

Reputation: 241

Datenum function is slow. What can I do?

I'm trying to convert a character vector (200,000 rows) into Matlab serial numbers.
The format is '01/07/2015 00:00:59'.

This takes an incredibly long time, and online I can only find tips for solving this in Matlab. Any ideas how I can improve this?

Upvotes: 0

Views: 877

Answers (1)

Hoki
Hoki

Reputation: 11812

You can use the datenum(datevector) type of input for datenum.

It is much faster than the string parsing. I frequently use this trick whenever I have to import long date/time data (which is nearly everyday).

It consists in sending a mx6 (or mx3) matrix, containing values representing [yy mm dd HH MM SS]. The matrix should be of type double.

It means instead of letting Matlab/Octave do the parsing, you read all the numbers in the string with your favourite way (textscan, fscanf, sscanf, ...), then you send numbers to datenum instead of string.

In the example below I generated a long array (86401x19) of date string as sample data:

>> strDate(1:5,:)
ans =
31/07/2015 15:10:13
31/07/2015 15:10:14
31/07/2015 15:10:15
31/07/2015 15:10:16
31/07/2015 15:10:17

To convert that to datenum faster than by the conventional way, I use:

strDate = [strDate repmat(' ',size(strDate,1),1)] ; %// add a whitespace at the end of each line
M = textscan( strDate.' , '%f/%f/%f %f:%f:%f'  ) ;  %'// read each value independently
M = cell2mat(M) ;                                   %// convert to matrix
M = M(:,[3 2 1 4 5 6]) ;                            %// reorder columns

dt = datenum(M ) ;                                  %// convert to serial date

This should bring speed up in Matlab but I am pretty sure it should improve things in Octave too. To quantify that at least on Matlab, here's a quick benchmark:

function test_datenum

d0 = now ;
d = (d0:1/3600/24:d0+1).' ; %// 1 day worth of date (one per second)

strDate = datestr(d,'dd/mm/yyyy HH:MM:SS') ; %'// generate the string array

fprintf('Time with automatic date parsing: %f\n' , timeit(@(x) datenum_auto(strDate)) )
fprintf('Time with customized date parsing: %f\n', timeit(@(x) datenum_preparsed(strDate)) )


function dt = datenum_auto(strDate)
    dt = datenum(strDate,'dd/mm/yyyy HH:MM:SS') ;       %// let Matlab/Octave do the parsing


function dt = datenum_preparsed(strDate)
    strDate = [strDate repmat(' ',size(strDate,1),1)] ; %// add a whitespace at the end of each line
    M = textscan( strDate.' , '%f/%f/%f %f:%f:%f'  ) ;  %'// read each value independently
    M = cell2mat(M) ;                                   %// convert to matrix

    M = M(:,[3 2 1 4 5 6]) ;                            %// reorder columns

    dt = datenum(M ) ;                                  %// convert to serial date

On my machine, it yields:

>> test_datenum
Time with automatic date parsing: 0.614698
Time with customized date parsing: 0.073633

Of course you could also compact the code in a couple of lines:

M = cell2mat(textscan([strDate repmat(' ',size(strDate,1),1)].','%f/%f/%f %f:%f:%f'))) ;
dt = datenum( M(:,[3 2 1 4 5 6]) ) ;

But I tested it and the improvement is so marginal that it is not really worth the loss of readability.

Upvotes: 3

Related Questions