Reputation: 731
I have an application that runs a simulation on temporal data. The calculations are are not overly complicated however the calculation requires previously predicted temporal data in the same simulation.
Let's assume a simulation using 1 week of data with data points at 15 minute interval. The simplified calculation for a predicted value is as follows:
PredictedValue = A + B + C + D
To get the values of B, C, and D, the simulation requires previously calculated data at t0, t-1, t-95, t-96, t-672 and t-673. This is the temporal data for the current and previous interval, previous day's current and previous interval and the previous week's current and previous interval.
I have a working implementation that simulate data over any time period however performance is extremely poor with large datasets.
The implementation uses a TSQL cursor to loop over the temporal data under simulation and scalar function to retrieve the previously calculated data.
Basically the larger the dataset the slower the simulation runs. For example, a simulation using 1 day of data takes < 1 minute; a simulation with a month of data takes 2-3 days.
I'm really interested in how I can improve the performance of the TSQL code without using a cursor or the scalar functions.
Upvotes: 0
Views: 718
Reputation: 11
In this case I would suggest using a self-join. SQL server is then able to process your query as a set giving a tremendous speed increase.
Simplified example, assuming the table has two columns (date, value) and is called Data
SELECT (A.value + B.value) as Prediction
FROM [Data] A left join [Data] B
ON B.date = dateadd(day,-1, A.date)
(done without testing so may contain typo's - but I hope you get the picture).
Good luck, Otto.
Upvotes: 1