Reputation: 241
I'm trying to convert a character vector (200,000 rows) into Matlab serial numbers.
The format is '01/07/2015 00:00:59'
.
This takes an incredibly long time, and online I can only find tips for solving this in Matlab. Any ideas how I can improve this?
Upvotes: 0
Views: 877
Reputation: 11812
You can use the datenum(datevector)
type of input for datenum
.
It is much faster than the string parsing. I frequently use this trick whenever I have to import long date/time data (which is nearly everyday).
It consists in sending a mx6
(or mx3
) matrix, containing values representing [yy mm dd HH MM SS]
. The matrix should be of type double
.
It means instead of letting Matlab/Octave do the parsing, you read all the numbers in the string with your favourite way (textscan
, fscanf
, sscanf
, ...), then you send numbers to datenum
instead of string.
In the example below I generated a long array (86401x19) of date string as sample data:
>> strDate(1:5,:)
ans =
31/07/2015 15:10:13
31/07/2015 15:10:14
31/07/2015 15:10:15
31/07/2015 15:10:16
31/07/2015 15:10:17
To convert that to datenum faster than by the conventional way, I use:
strDate = [strDate repmat(' ',size(strDate,1),1)] ; %// add a whitespace at the end of each line
M = textscan( strDate.' , '%f/%f/%f %f:%f:%f' ) ; %'// read each value independently
M = cell2mat(M) ; %// convert to matrix
M = M(:,[3 2 1 4 5 6]) ; %// reorder columns
dt = datenum(M ) ; %// convert to serial date
This should bring speed up in Matlab but I am pretty sure it should improve things in Octave too. To quantify that at least on Matlab, here's a quick benchmark:
function test_datenum
d0 = now ;
d = (d0:1/3600/24:d0+1).' ; %// 1 day worth of date (one per second)
strDate = datestr(d,'dd/mm/yyyy HH:MM:SS') ; %'// generate the string array
fprintf('Time with automatic date parsing: %f\n' , timeit(@(x) datenum_auto(strDate)) )
fprintf('Time with customized date parsing: %f\n', timeit(@(x) datenum_preparsed(strDate)) )
function dt = datenum_auto(strDate)
dt = datenum(strDate,'dd/mm/yyyy HH:MM:SS') ; %// let Matlab/Octave do the parsing
function dt = datenum_preparsed(strDate)
strDate = [strDate repmat(' ',size(strDate,1),1)] ; %// add a whitespace at the end of each line
M = textscan( strDate.' , '%f/%f/%f %f:%f:%f' ) ; %'// read each value independently
M = cell2mat(M) ; %// convert to matrix
M = M(:,[3 2 1 4 5 6]) ; %// reorder columns
dt = datenum(M ) ; %// convert to serial date
On my machine, it yields:
>> test_datenum
Time with automatic date parsing: 0.614698
Time with customized date parsing: 0.073633
Of course you could also compact the code in a couple of lines:
M = cell2mat(textscan([strDate repmat(' ',size(strDate,1),1)].','%f/%f/%f %f:%f:%f'))) ;
dt = datenum( M(:,[3 2 1 4 5 6]) ) ;
But I tested it and the improvement is so marginal that it is not really worth the loss of readability.
Upvotes: 3