Reputation: 451
This query works (thanks to those that helped) to generate a 30-day moving average of volume.
SELECT x.symbol, x.dseqkey, AVG(y.VOLUME) moving_average
FROM STOCK_HIST x, STOCK_HIST y
WHERE x.dseqkey>=29 AND x.dseqkey BETWEEN y.dseqkey AND y.dseqkey+29
AND Y.Symbol=X.Symbol
GROUP BY x.symbol, x.dseqkey
ORDER BY x.dseqkey DESC
However the performance is very bad. I am running the above against a view (STOCK_HIST) that brings two tables (A and B) together. Table A contains daily stock volume and the daily date for over 9,000 stocks dating back as far as 40 years (300+ rows, per year, per each of the 9,000 stocks). Table B is a "Date Key" table that links the date in table A to the DSEQKEY (int).
What are my options for performance improvement? I have heard that views are convenient but not performant. Should I just copy the columns needed from table A and B to a single table and then run the above query? I have indexes on the tables A and B on the stock symbol + date (A) and DSEQKEY (B).
Is it the view that's killing my performance? How can I improve this?
EDIT
By request, I have posted the 2 tables and the view below. Also, now there is one clustered index on the view and each table. I am open to any recommendations as this query that produces the deisred result, is still slow:
SELECT
x.symbol
, x.dseqkey
, AVG(y.VOLUME) moving_average
FROM STOCK_HIST x
JOIN STOCK_HIST y ON x.dseqkey BETWEEN y.dseqkey AND y.dseqkey+29 AND Y.Symbol=X.Symbol
WHERE x.dseqkey >= 15000
GROUP BY x.symbol, x.dseqkey
ORDER BY x.dseqkey DESC ;
HERE IS THE VIEW:
CREATE VIEW [dbo].[STOCK_HIST]
WITH SCHEMABINDING
AS
SELECT
dbo.DATE_MASTER.date
, dbo.DATE_MASTER.year
, dbo.DATE_MASTER.quarter
, dbo.DATE_MASTER.month
, dbo.DATE_MASTER.week
, dbo.DATE_MASTER.wday
, dbo.DATE_MASTER.day
, dbo.DATE_MASTER.nday
, dbo.DATE_MASTER.wkmax
, dbo.DATE_MASTER.momax
, dbo.DATE_MASTER.qtrmax
, dbo.DATE_MASTER.yrmax
, dbo.DATE_MASTER.dseqkey
, dbo.DATE_MASTER.wseqkey
, dbo.DATE_MASTER.mseqkey
, dbo.DATE_MASTER.qseqkey
, dbo.DATE_MASTER.yseqkey
, dbo.DATE_MASTER.tom
, dbo.QP_HISTORY.Symbol
, dbo.QP_HISTORY.[Open] as propen
, dbo.QP_HISTORY.High as prhigh
, dbo.QP_HISTORY.Low as prlow
, dbo.QP_HISTORY.[Close] as prclose
, dbo.QP_HISTORY.Volume
, dbo.QP_HISTORY.QRS
FROM dbo.DATE_MASTER
INNER JOIN dbo.QP_HISTORY ON dbo.DATE_MASTER.date = dbo.QP_HISTORY.QPDate ;
HERE IS DATE_MASTER TABLE:
CREATE TABLE [dbo].[DATE_MASTER] (
[date] [datetime] NULL
, [year] [int] NULL
, [quarter] [int] NULL
, [month] [int] NULL
, [week] [int] NULL
, [wday] [int] NULL
, [day] [int] NULL
, [nday] nvarchar NULL
, [wkmax] [bit] NOT NULL
, [momax] [bit] NOT NULL
, [qtrmax] [bit] NOT NULL
, [yrmax] [bit] NOT NULL
, [dseqkey] [int] IDENTITY(1,1) NOT NULL
, [wseqkey] [int] NULL
, [mseqkey] [int] NULL
, [qseqkey] [int] NULL
, [yseqkey] [int] NULL
, [tom] [bit] NOT NULL
) ON [PRIMARY] ;
HERE IS THE QP_HISTORY TABLE:
CREATE TABLE [dbo].[QP_HISTORY] (
[Symbol] varchar NULL
, [QPDate] [date] NULL
, [Open] [real] NULL
, [High] [real] NULL
, [Low] [real] NULL
, [Close] [real] NULL
, [Volume] [bigint] NULL
, [QRS] [smallint] NULL
) ON [PRIMARY] ;
HERE IS THE VIEW (STOCK_HIST) INDEX
CREATE UNIQUE CLUSTERED INDEX [ix_STOCK_HIST] ON [dbo].[STOCK_HIST]
(
[Symbol] ASC,
[dseqkey] ASC,
[Volume] ASC
)
HERE IS THE QP_HIST INDEX
CREATE UNIQUE CLUSTERED INDEX [IX_QP_HISTORY] ON [dbo].[QP_HISTORY]
(
[Symbol] ASC,
[QPDate] ASC,
[Close] ASC,
[Volume] ASC
)
HERE IS THE INDEX ON DATE_MASTER
CREATE UNIQUE CLUSTERED INDEX [IX_DATE_MASTER] ON [dbo].[DATE_MASTER]
(
[date] ASC,
[dseqkey] ASC,
[wseqkey] ASC,
[mseqkey] ASC
)
I do not have any primary keys setup. Would this help performance?
EDIT - After making suggested changes the query is slower than before. What ran in 10m 44s is currently at 30m and still running.
I made all of the requested changes except I did not change name of date in Date_Master and I did not drop the QPDate column from QP_Hist. (I have reasons for this and do not see it impacting the performance since I'm not referring to it in the query.)
REVISED QUERY
select x.symbol, x.dmdseqkey, avg(y.volume) as moving_average
from dbo.QP_HISTORY as x
join dbo.QP_HISTORY as y on (x.dmdseqkey between y.dmdseqkey and (y.dmdseqkey + 29))
and (y.symbol = x.symbol)
where x.dmdseqkey >= 20000
group by x.symbol, x.dmdseqkey
order by x.dmdseqkey desc ;
PK on QP_History
ALTER TABLE [dbo].[QP_HISTORY]
ADD CONSTRAINT [PK_QP_HISTORY] PRIMARY KEY CLUSTERED ([Symbol] ASC, DMDSeqKey] ASC)
FK on QP_History
ALTER TABLE [dbo].[QP_HISTORY] ADD CONSTRAINT [FK_QP_HISTORY_DATE_MASTER] FOREIGN KEY([DMDSeqKey]) REFERENCES [dbo].[DATE_MASTER] ([dseqkey])
PK on Date_Master
ALTER TABLE [dbo].[DATE_MASTER]
ADD CONSTRAINT [PK_DATE_MASTER] PRIMARY KEY CLUSTERED ([dseqkey] ASC)
EDIT
HERE IS THE EXECUTION PLAN
Upvotes: 1
Views: 4350
Reputation: 22187
OK, so I'll start from the end. I would like to achieve this model.
With this in place, you can run the query on the history table directly, no need for the view and join to the dbo.DATE_MASTER
.
select
x.symbol
, x.dseqkey
, avg(y.volume) as moving_average
from dbo.QP_HISTORY as x
join dbo.QP_HISTORY as y on (x.dSeqKey between y.dSeqKey and (y.dSeqKey + 29))
and (y.symbol = x.symbol)
where x.dseqkey >= 15000
group by x.symbol, x.dseqkey
order by x.dseqkey desc
OPTION (ORDER GROUP) ;
The QP_HISTORY
is narrower than the STOCK_HISTORY
view, so the query should be faster. The "redundant column removal" from joins is scheduled for the next generation of SQL Server (Denali), so for the time being narrower usually means faster -- at least for large tables. Also, the join on ..
and the where
clause nicely match the the PK(Symbol, dSeqKey)
.
Now, how to achieve this:
a) Modify the [date]
column in dbo.DATE_MASTER
to be if the type date
instead of datetime
. Rename it FullDate
to avoid confusion. Not absolutely necessary, but to preserve my sanity.
b) Add PK to the dbo.DATE_MASTER
alter table dbo.DATE_MASTER add constraint primary key pk_datemstr (dSeqKey);
c) In the table QP_HISTORY
add column dSeqKey
and populate it for matching QPDate
dates.
d) Drop the QPDate
column from the table.
e) Add PK and FK to the QP_HISTORY
alter table dbo.QP_HISTORY
add constraint pk_qphist primary key (Symbol, dSeqKey)
, add constraint fk1_qphist foreign key (dSeqKey)
references dbo.DATE_MASTER(dSeqKey) ;
f) Drop all those indexes mentioned at the end ouf your question, at least for the time being.
g) I do not see the size of the Symbol
field. Define it as narrow as possible.
h) Needles to say, implement and test this on a development system first.
Upvotes: 0
Reputation: 432261
First, separate join an filter.
(edit: fixed ON clause)
SELECT x.symbol, x.dseqkey, AVG(y.VOLUME) moving_average
FROM
STOCK_HIST x
JOIN
STOCK_HIST y ON x.dseqkey BETWEEN y.dseqkey AND y.dseqkey+29
AND Y.Symbol=X.Symbol
WHERE x.dseqkey>=29
GROUP BY x.symbol, x.dseqkey
ORDER BY x.dseqkey DESC
Also, what indexes do you have - I'd suggest an index on (dseqkey, symbol) INCLUDE (VOLUME)
Edit 3: you can't have an INCLUDE in a clustered index, my bad. Your syntax is OK.
Please try these permutations... the aim is find the best index for the JOIN and WHERE, followed with the ORDER BY.
CREATE UNIQUE CLUSTERED INDEX [ix_STOCK_HIST] ON [dbo].[STOCK_HIST] (...
...[Symbol] ASC, [dseqkey] ASC, [Volume] ASC )
...[dseqkey] ASC, [Symbol] ASC, [Volume] ASC )
...[Symbol] ASC, [dseqkey] DESC, [Volume] ASC )
...[dseqkey] DESC, [Symbol] ASC, [Volume] ASC )
Upvotes: 4
Reputation: 425371
SQL Server
does not support LAG
or LEAD
clauses available in Oracle
and PostgreSQL
, neither does it support session variables like MySQL
.
Calculating aggregates against moving windows is a pain in SQL Server
.
So God knows I hate to say this, however, in this case a CURSOR
based solution may be more efficient.
Upvotes: 3
Reputation: 15677
try putting a clustered index on the view. that will make the view persisted to disk like a normal table and your tables won't have to be accessed every time.
that should speed things up a bit.
for better answer please post the link to your original question to see if a better solution can be found.
Upvotes: 1