Reputation: 1269
I have a large table in Matlab of 7 variables and about 2 million rows. The first columns/variable has Ids, the second has dates, and the 3rd variable has prices. For each Id and each date I want to check whether the price was above 100 in each of the previous 6 days. I have a solution but it's very slow, so I would like ideas for improving speed. My solution is the following (with some toy data):
Data = table(reshape(repmat(1:4,3000,1),12000,1),repmat(datestr(datenum(2001,01,31):1:datenum(2009,04,18)),4,1),normrnd(200,120,12000,1),...
'VariableNames',{'ID','Date','Price'});
function y=Lag6days(x)
y=zeros(size(x));
for i=7:size(x,1)
y(i,1)=sum(x(i-6:i-1,1)>100)==6;
end
end
Func = @Lag6days;
A = varfun(Func,Data,'GroupingVariables',{'ID'},'InputVariables','Price');
Any suggestions?
Upvotes: 1
Views: 609
Reputation: 6084
This might have something to do with the table data structure - which I'm not really used to.
Consider the use of 'OutputFormat','cell'
, in the call of varfun
, this seems to work for me.
Of course you would have to make sure that the grouping procedure of varfun is stable, so that your dates don't get mixed.
You could consider extracting each ID group into separate vectors by using:
A1 = Lag6days(Data.Price(Data.ID==1));
...
So you can have more control over your dates getting shuffled.
PS: Obviously your algorithm will only work if your prices are already sorted by date and there's exactly one price entry per day. It would be good practice to check for these assertions.
Upvotes: 1