Reputation: 451

Aggregate Function/Group-By Query Performance

This query works (thanks to those that helped) to generate a 30-day moving average of volume.

SELECT x.symbol, x.dseqkey, AVG(y.VOLUME) moving_average
FROM STOCK_HIST x, STOCK_HIST y
WHERE x.dseqkey>=29 AND x.dseqkey BETWEEN y.dseqkey AND y.dseqkey+29
AND Y.Symbol=X.Symbol 
GROUP BY x.symbol, x.dseqkey
ORDER BY x.dseqkey DESC

However the performance is very bad. I am running the above against a view (STOCK_HIST) that brings two tables (A and B) together. Table A contains daily stock volume and the daily date for over 9,000 stocks dating back as far as 40 years (300+ rows, per year, per each of the 9,000 stocks). Table B is a "Date Key" table that links the date in table A to the DSEQKEY (int).

What are my options for performance improvement? I have heard that views are convenient but not performant. Should I just copy the columns needed from table A and B to a single table and then run the above query? I have indexes on the tables A and B on the stock symbol + date (A) and DSEQKEY (B).

Is it the view that's killing my performance? How can I improve this?

EDIT

By request, I have posted the 2 tables and the view below. Also, now there is one clustered index on the view and each table. I am open to any recommendations as this query that produces the deisred result, is still slow:

SELECT
      x.symbol
    , x.dseqkey
    , AVG(y.VOLUME) moving_average
FROM STOCK_HIST x
JOIN STOCK_HIST y ON x.dseqkey BETWEEN y.dseqkey AND y.dseqkey+29 AND Y.Symbol=X.Symbol
WHERE    x.dseqkey >= 15000
GROUP BY x.symbol, x.dseqkey
ORDER BY x.dseqkey DESC ;

HERE IS THE VIEW:

CREATE VIEW [dbo].[STOCK_HIST]
WITH SCHEMABINDING
AS
SELECT
      dbo.DATE_MASTER.date
    , dbo.DATE_MASTER.year
    , dbo.DATE_MASTER.quarter
    , dbo.DATE_MASTER.month
    , dbo.DATE_MASTER.week
    , dbo.DATE_MASTER.wday
    , dbo.DATE_MASTER.day
    , dbo.DATE_MASTER.nday
    , dbo.DATE_MASTER.wkmax
    , dbo.DATE_MASTER.momax
    , dbo.DATE_MASTER.qtrmax
    , dbo.DATE_MASTER.yrmax
    , dbo.DATE_MASTER.dseqkey
    , dbo.DATE_MASTER.wseqkey
    , dbo.DATE_MASTER.mseqkey
    , dbo.DATE_MASTER.qseqkey
    , dbo.DATE_MASTER.yseqkey
    , dbo.DATE_MASTER.tom
    , dbo.QP_HISTORY.Symbol
    , dbo.QP_HISTORY.[Open]  as propen
    , dbo.QP_HISTORY.High    as prhigh
    , dbo.QP_HISTORY.Low     as prlow
    , dbo.QP_HISTORY.[Close] as prclose
    , dbo.QP_HISTORY.Volume
    , dbo.QP_HISTORY.QRS
FROM dbo.DATE_MASTER
INNER JOIN dbo.QP_HISTORY ON dbo.DATE_MASTER.date = dbo.QP_HISTORY.QPDate ;

HERE IS DATE_MASTER TABLE:

CREATE TABLE [dbo].[DATE_MASTER] (
      [date]    [datetime] NULL
    , [year]    [int] NULL
    , [quarter] [int] NULL
    , [month]   [int] NULL
    , [week]    [int] NULL
    , [wday]    [int] NULL
    , [day]     [int] NULL
    , [nday]    nvarchar NULL
    , [wkmax]   [bit] NOT NULL
    , [momax]   [bit] NOT NULL
    , [qtrmax]  [bit] NOT NULL
    , [yrmax]   [bit] NOT NULL
    , [dseqkey] [int] IDENTITY(1,1) NOT NULL
    , [wseqkey] [int] NULL
    , [mseqkey] [int] NULL
    , [qseqkey] [int] NULL
    , [yseqkey] [int] NULL
    , [tom]     [bit] NOT NULL
) ON [PRIMARY] ;

HERE IS THE QP_HISTORY TABLE:

CREATE TABLE [dbo].[QP_HISTORY] (
      [Symbol] varchar    NULL
    , [QPDate] [date]     NULL
    , [Open]   [real]     NULL
    , [High]   [real]     NULL
    , [Low]    [real]     NULL
    , [Close]  [real]     NULL
    , [Volume] [bigint]   NULL
    , [QRS]    [smallint] NULL
) ON [PRIMARY] ;

HERE IS THE VIEW (STOCK_HIST) INDEX

CREATE UNIQUE CLUSTERED INDEX [ix_STOCK_HIST] ON [dbo].[STOCK_HIST]   
(
    [Symbol] ASC,  
    [dseqkey] ASC,  
    [Volume] ASC  
)

HERE IS THE QP_HIST INDEX

CREATE UNIQUE CLUSTERED INDEX [IX_QP_HISTORY] ON [dbo].[QP_HISTORY] 
(
    [Symbol] ASC,
    [QPDate] ASC,
    [Close] ASC,
    [Volume] ASC
)

HERE IS THE INDEX ON DATE_MASTER

CREATE UNIQUE CLUSTERED INDEX [IX_DATE_MASTER] ON [dbo].[DATE_MASTER] 
(
    [date] ASC,
    [dseqkey] ASC,
    [wseqkey] ASC,
    [mseqkey] ASC
)

I do not have any primary keys setup. Would this help performance?

EDIT - After making suggested changes the query is slower than before. What ran in 10m 44s is currently at 30m and still running.

I made all of the requested changes except I did not change name of date in Date_Master and I did not drop the QPDate column from QP_Hist. (I have reasons for this and do not see it impacting the performance since I'm not referring to it in the query.)

REVISED QUERY

select x.symbol, x.dmdseqkey, avg(y.volume) as moving_average 
from dbo.QP_HISTORY as x
join dbo.QP_HISTORY as y on (x.dmdseqkey between y.dmdseqkey and (y.dmdseqkey + 29))                          
                         and (y.symbol = x.symbol)  
where x.dmdseqkey >= 20000 
group by x.symbol, x.dmdseqkey 
order by x.dmdseqkey desc ;

PK on QP_History

ALTER TABLE [dbo].[QP_HISTORY]
   ADD CONSTRAINT [PK_QP_HISTORY] PRIMARY KEY CLUSTERED ([Symbol] ASC, DMDSeqKey] ASC)

FK on QP_History

ALTER TABLE [dbo].[QP_HISTORY] ADD  CONSTRAINT [FK_QP_HISTORY_DATE_MASTER] FOREIGN KEY([DMDSeqKey]) REFERENCES [dbo].[DATE_MASTER] ([dseqkey])

PK on Date_Master

ALTER TABLE [dbo].[DATE_MASTER]
 ADD  CONSTRAINT [PK_DATE_MASTER] PRIMARY KEY CLUSTERED ([dseqkey] ASC)

EDIT

HERE IS THE EXECUTION PLAN

Upvotes: 1

Answers (4)

Damir Sudarevic

Reputation: 22187

OK, so I'll start from the end. I would like to achieve this model.

enter image description here

With this in place, you can run the query on the history table directly, no need for the view and join to the dbo.DATE_MASTER.

select
      x.symbol
    , x.dseqkey
    , avg(y.volume) as moving_average
from dbo.QP_HISTORY as x
join dbo.QP_HISTORY as y on  (x.dSeqKey between y.dSeqKey and (y.dSeqKey + 29))
                         and (y.symbol = x.symbol) 
where x.dseqkey >= 15000
group by x.symbol, x.dseqkey
order by x.dseqkey desc 
OPTION (ORDER GROUP) ;

The QP_HISTORY is narrower than the STOCK_HISTORY view, so the query should be faster. The "redundant column removal" from joins is scheduled for the next generation of SQL Server (Denali), so for the time being narrower usually means faster -- at least for large tables. Also, the join on .. and the where clause nicely match the the PK(Symbol, dSeqKey).

Now, how to achieve this:

a) Modify the [date] column in dbo.DATE_MASTER to be if the type date instead of datetime. Rename it FullDate to avoid confusion. Not absolutely necessary, but to preserve my sanity.

b) Add PK to the dbo.DATE_MASTER

alter table dbo.DATE_MASTER add constraint primary key pk_datemstr (dSeqKey);

c) In the table QP_HISTORY add column dSeqKey and populate it for matching QPDate dates.

d) Drop the QPDate column from the table.

e) Add PK and FK to the QP_HISTORY

alter table dbo.QP_HISTORY
   add constraint pk_qphist  primary key (Symbol, dSeqKey)
 , add constraint fk1_qphist foreign key (dSeqKey)
                             references dbo.DATE_MASTER(dSeqKey) ;

f) Drop all those indexes mentioned at the end ouf your question, at least for the time being.

g) I do not see the size of the Symbol field. Define it as narrow as possible.

h) Needles to say, implement and test this on a development system first.

Upvotes: 0

gbn

Reputation: 432261

First, separate join an filter.

(edit: fixed ON clause)

SELECT x.symbol, x.dseqkey, AVG(y.VOLUME) moving_average
FROM
    STOCK_HIST x
    JOIN
    STOCK_HIST y ON x.dseqkey BETWEEN y.dseqkey AND y.dseqkey+29
                                AND Y.Symbol=X.Symbol 
WHERE x.dseqkey>=29
GROUP BY x.symbol, x.dseqkey
ORDER BY x.dseqkey DESC

Also, what indexes do you have - I'd suggest an index on (dseqkey, symbol) INCLUDE (VOLUME)

Edit 3: you can't have an INCLUDE in a clustered index, my bad. Your syntax is OK.

Please try these permutations... the aim is find the best index for the JOIN and WHERE, followed with the ORDER BY.

CREATE UNIQUE CLUSTERED INDEX [ix_STOCK_HIST] ON [dbo].[STOCK_HIST] (...

    ...[Symbol] ASC, [dseqkey] ASC, [Volume] ASC )

    ...[dseqkey] ASC, [Symbol] ASC, [Volume] ASC )

    ...[Symbol] ASC, [dseqkey] DESC, [Volume] ASC )

    ...[dseqkey] DESC, [Symbol] ASC, [Volume] ASC )

Upvotes: 4

Quassnoi

Reputation: 425371

SQL Server does not support LAG or LEAD clauses available in Oracle and PostgreSQL, neither does it support session variables like MySQL.

Calculating aggregates against moving windows is a pain in SQL Server.

So God knows I hate to say this, however, in this case a CURSOR based solution may be more efficient.

Upvotes: 3

Mladen Prajdic

Reputation: 15677

try putting a clustered index on the view. that will make the view persisted to disk like a normal table and your tables won't have to be accessed every time.

that should speed things up a bit.

for better answer please post the link to your original question to see if a better solution can be found.

Upvotes: 1

Aggregate Function/Group-By Query Performance

Answers (4)

Related Questions