Reputation: 179
Before asking my question, here's a little background so you understand what I'm doing. I'm looking to analyze a very large data set (a little less than 2,000,000 rows). I've parsed the data set into Matlab and built a structure array from this data, giving names, dates, returns, etc for each asset i. Now, I would like to restrict my data set to being between two days, and Matlab doesn't seem to be particularly amenable to that kind of approach. One suggestion that was given to me was to take the dates, which are of the form MM/DD/YYYY and use a delimiter '/' to somehow build three integer arrays for my data structure (which I'd call stock(i).month, stock(i).day, and stock(i).year). However, nothing I'm doing seems to be working, and I'm very much stuck.
What I have been trying to do is something like the following:
%% Dates
fid = fopen('52c6d3831952b24a.csv');
C = textscan(fid, [repmat('%*s ',1,0),'%s %*[^\n]'], 'delimiter',',');
date = C{1}(2:end,1);
fclose(fid);
for i=1:numStock
locate = strcmp(uniquePermno{i},permno);
stock(i+1).date = date(locate);
end;
for i = 1:numStock
stock(i+1).date = char(stock(i+1).date);
D = textscan(stock(i+1).date, '%s %s %s', 'delimiter','/');
stock(i+1).month = D{1}(1:end);
stock(i+1).day = D{2}(1:end);
stock(i+1).year = D{3}(1:end);
end
I initially wanted to save them as integers (and was using %u instead), but I was getting a strange situation where most of my entries were just 0 and the non-zero ones were very large (obviously not what I expected). However, the above form returns the following error:
Error using textscan
Buffer overflow (bufsize = 4095) while reading string from
file (row 1 u, field 1 u). Use 'bufsize' option. See HELP TEXTSCAN.
44444444444444444444455555555555555555555566666666666666666666677777777777777777777778888888888888888888889999999999999999999990000000000000000000000011111111111111111112222222222222222222222111111111
Error in makeData_CRSP (line 87)
D = textscan(stock(i+1).date, '%s %s %s', 'delimiter','/');
So I'm honestly at a loss for how to approach this. What am I doing wrong? Seeing how I saved my dates vectors for my data structure, is this the best way to approach this problem?
Upvotes: 0
Views: 192
Reputation: 10375
You can use the datenum
function to convert dates into numbers. The syntax is datenum(dateString, format)
. For example, if your dates are in the format YYYY MM DD
then that would be
datenum('2012 12 04', 'yyyy mm dd')
Once you converted all your dates like that you can simply compare the resulting numbers using >
and <
:
>> datenum('2012 12 04', 'yyyy mm dd') > datenum('2012 12 03', 'yyyy mm dd')
ans =
1
>> datenum('2012 12 04', 'yyyy mm dd') > datenum('2012 12 05', 'yyyy mm dd')
ans =
0
Upvotes: 1