Baron von Connor
Baron von Connor

Reputation: 48

T-SQL Queries and C# - when performance matters

So ok, couldn't think of a very great way of expressing this in a title but the scenario is this: -

Your building a report. The report is on an aspx page with a C# code behind which accesses T-SQL databases.

The table your getting the data from is pretty darn large (millions of rows). Two columns you need to do a lookup on another table to get (Group and SuperGroup - see below), and this lookup table just happens to be a good few tens of thousands of rows too (not to mention you actually have to join two tables to create the lookup table properly - see #partGroups below)

Now bearing in mind the page running this will time out after 2 minutes...

Heres a couple of assumptions that have to be made along the way: -

  1. The tables and their layout are immutable, regardless of design being bad or whatever, they are what they are and you have to work with them (Assets, CoreStockParts and CoreStockPartsGroups).
  2. The page timeout can NOT be altered.
  3. PartNumbers (Text01 in Assets, PartNo in CoreStockParts) could, can and do contain -'s and/or spaces in one table but not the other so they need removing.
  4. PartNumbers in Assets could and are sometimes prefixed with a character in Assets but not in CoreStockParts.

This is what I've basically got so far: -

    select rtrim(ltrim(Replace(Replace(csp.PartNo,' ',''), '-',''))) as PartNumber, 
        csp.[Description], csp.GroupCode, coalesce(cspg.[Group], 'Unknown') as [Group], coalesce(cspg.SuperGroup, 'Unknown') as SuperGroup
    into #partGroups
    from CoreStockParts as csp
        left join CoreStockPartsGroups as cspg on csp.GroupCode = cspg.Code

    select p.ID, 
        rtrim(Replace(Replace(p.Text01,' ',''), '-','')) as PartNumber1, 
        right(p.Text01, len(p.Text01)-1) as PartNumber2,
        p.Numeric01 as CostAmount, p.Numeric02 as SaleAmount, p.Numeric03 as ExtendedCostAmount, 
        p.Numeric04 as ExtendedSaleAmount, p.Numeric05 as Quantity, p.Date01 as InvoiceDate
    INTO #coreParts
    FROM Assets as p
    WHERE p.Category = 'PART'
    and len(p.Text01) > 0

    select ID, PartNumber1, PartNumber2, [Description], CostAmount, SaleAmount, ExtendedCostAmount, 
        ExtendedSaleAmount, Quantity, InvoiceDate, [Group], SuperGroup
    from #coreParts as cp
        inner join #partGroups as pg on cp.PartNumber1 = pg.PartNumber
    union
    select ID, PartNumber1, PartNumber2, [Description], CostAmount, SaleAmount, ExtendedCostAmount, 
        ExtendedSaleAmount, Quantity, InvoiceDate, [Group], SuperGroup
    from #coreParts as cp
        inner join #partGroups as pg on cp.PartNumber2 = pg.PartNumber

This is currently finishing in about 1 minute and 45 seconds with a medium server load. There are still restrictions that need adding which include but are not limited to filtering based on Group, SuperGroup and a date range based on InvoiceDate. On top of this once I finally HAVE this data I then need to start performing aggregate functions across it to produce graphs of sales quantities/values etc for various Groups/SuperGroups.

Now I'm THINKING if I can keep it to this speed.... that will do though its hardly ideal. If I can speed it up then great! Anything over 15 seconds longer however and we hit a wall however.

So the crux of this question is I guess multiple in that: -

  1. Am I missing anything obvious that I could be doing to optimize this in general?
  2. Would it be better at this point to return the results to C# and LINQ the numbers I need?
  3. I THINK if I'm filtering in the T-SQL the best places to do so would be on the select into's of the temporary tables rather than the resulting mash in the last statement?

EDIT: Ok some updates on this!

Firstly I was wrong on my assessment of what would be allowed it seems, we've got the authorization to add a snapshot table which can do all the work of getting the data we need together over night for running the report code on as and when the following day.

Special thanks to Blindy and user17594 for your input regarding indexing and bits that would prevent use of indexes. (bits, thats technical language ya know 8D).

Upvotes: 0

Views: 171

Answers (4)

GarethD
GarethD

Reputation: 69769

My approach to this would be to add computed columns to your tables:

ALTER TABLE CoreStockParts 
ADD PartNumber AS RTRIM(LTRIM((REPLACE(REPLACE(PartNo,' ',''), '-','')));

ALTER TABLE CoreParts
ADD PartNumber1 AS RTRIM(REPLACE(REPLACE(p.Text01,' ',''), '-','')),
    PartNumber2 AS RIGHT(p.Text01, LEN(p.Text01)-1);

Now that you have these computed columns you can index them:

CREATE NONCLUSTERED INDEX IX_CoreStockParts_PartNumber ON CoreStockParts (PartNumber)
    --INCLUDE ([Description], GroupCode)

Not sure of your exact table structure, but including Non-key columns in the index may assist performance, uncomment the second line if you think they will.

CREATE NONCLUSTERED INDEX IX_CoreParts_PartNumber1__part ON CoreParts (PartNumber1)
    --WHERE Category = 'PART' AND LEN(Text01) > 0

CREATE NONCLUSTERED INDEX IX_CoreParts_PartNumber2__part ON CoreParts (PartNumber2)
    --WHERE Category = 'PART' AND LEN(Text01) > 0

Depending on how you will use the table elsewhere, this index may benefit from being a filtered index, again uncomment if necessary. You may also want further indexes on CoreStockPartsGroups.Code and 'CoreStockParts.GroupCode'.

Finally I would not use temporary tables, it is usually better to avoid them as you lose the benefit of existing indexes on the source tables:

WITH PartGroups AS
(   SELECT  csp.PartNumber,
            csp.[Description], 
            csp.GroupCode, 
            ISNULL(cspg.[Group], 'Unknown') AS [Group], 
            ISNULL(cspg.SuperGroup, 'Unknown') AS SuperGroup
    FROM    CoreStockParts AS csp
            LEFT JOIN CoreStockPartsGroups AS cspg 
                ON csp.GroupCode = cspg.Code
)
SELECT  p.ID, 
        p.PartNumber1, 
        p.PartNumber2, 
        p.[Description], 
        CostAmount = p.Numeric01, 
        SaleAmount = p.Numeric02,
        ExtendedCostAmount = p.Numeric03, 
        ExtendedSaleAmount = p.Numeric04, 
        Quantity = p.Numeric05, 
        InvoiceDate = p.Date01, 
        pg.[Group], 
        pg.SuperGroup
FROM    Assets as p
        INNER JOIN partGroups AS pg 
            ON p.PartNumber1 = pg.PartNumber
WHERE   p.Category = 'PART'
AND     LEN(p.Text01) > 0 as cp
UNION 

SELECT  p.ID, 
        p.PartNumber1, 
        p.PartNumber2, 
        p.[Description], 
        CostAmount = p.Numeric01, 
        SaleAmount = p.Numeric02,
        ExtendedCostAmount = p.Numeric03, 
        ExtendedSaleAmount = p.Numeric04, 
        Quantity = p.Numeric05, 
        InvoiceDate = p.Date01, 
        pg.[Group], 
        pg.SuperGroup
FROM    Assets as p
        INNER JOIN partGroups AS pg 
            ON p.PartNumber2 = pg.PartNumber
WHERE   p.Category = 'PART'
AND     LEN(p.Text01) > 0 as cp;

Running in SSMS with display actual plan enabled will suggest futher improvement by adding indexes.

Upvotes: 0

Binary Worrier
Binary Worrier

Reputation: 51711

Another idea is to create a Reporting Data service.

This runs over the existing data, populating a separate Database with a structure optimized for reporting, your reports run over this, which should be orders of magnitude faster, given that all the heavy lifting was done by the Reporting service.

One draw back is that the new data will be "stale" depending on how quickly changes populate into the Reporting Database. Lots of businesses won't notice or care that data is 60 or 90 seconds out of date, but it obviously depends on your own use case.

Upvotes: 1

CElliott
CElliott

Reputation: 452

You may wish to try a few things to see which, if any, will work. Always note that with any change you need to test, test, and test some more to see if it is improving performance on your systems.

  • Use UNION ALL instead of UNION to remove the DISTINCT check that is associated with UNION (unless you know this will duplicate results)
  • Alternatively, remove your UNION and replace your join with pg.PartNumber IN (cp.PartNumber1,cp.PartNumber2)
  • Check your indices. and len(p.Text01) > 0 will prevent the use of an index seek. Try AND p.Text01 IS NOT NULL AND p.Text01 != '' instead
  • Try indexing your temp tables CREATE UNIQUE CLUSTERED INDEX IX_partGroups_tmp ON #partGroups (PartNumber) as well as CREATE NONCLUSTERED INDEX IX_coreParts_tmp ON #coreParts (PartNumber1,PartNumber2)

Try to keep your data processing within the database, as the data platform is optimized for this. Filter as much as you can and return only what you need to your client.

Upvotes: 2

Gusman
Gusman

Reputation: 15151

Instead of doing your reports in an asp page, use a background service:

  1. Create a service which waits for queries (through pipes, sockets, wcf, whateveryoulike...)

  2. When you need a report, from our asp page ask to the service if that report already exists, if not tell the service to create it.

  3. Show the user a "Loading" message and through Ajax ask your page to ask your service if the report is ready.

  4. When the service has "munched" all the data, present your reports/charts through your asp page.

Is always a good idea to sepparate long-term running processes from web pages to avoid timeouts and leaving the user with a page which looks to be hung (suppose a very nervous user who starts to press F5 because the page took 2 minutes to render... your server will crash with tons of reports being generated).

Upvotes: 3

Related Questions