Reputation: 515
I'm working with a database where the details of a phone call a user places with a client are saved into a single VARCHAR(MAX) column. For detailed reporting purposes (SSRS), however, all of the various attributes embedded within the VARCHAR column need to be turned into separate columns. To accomplish this parsing, I'm using a cursor in a stored procedure which calls a UDF that handles the extraction of the desired substring.
When querying a relatively small number of rows (no more than a few hundred), the performance of the query is decent. However, when querying a larger number of rows (several thousand), performance is terrible and the query will often time out. So, I'm looking for an approach that scales far better than what I have in place.
The VARCHAR column contains full HTML markup. I've removed irrelevant stuff to illustrate how the data I'm interested in is stored:
<table>
<tr>
<td class="key">Dial Result:</td><td id="DialResult" class="value">No Answer</td></tr>
<tr><td class="key">Client Name:</td><td id="ClientName" class="value">SMITH, BOB</td></tr>
<tr><td class="key">Number Dialed:</td><td id="NumberDialed" class="value">5555555555 [Day]</td></tr>
<tr><td class="key">Dial Count:</td><td id="DialCount" class="value">1</td></tr><tr><td class="key">Contact Made:</td><td id="ContactMade" class="value">No</td></tr>
<tr><td class="key">Campaign Called On:</td><td id="CampaignCalledOn" class="value">TEST CAMPAIGN</td></tr>
<tr><td class="key">Call Outcome:</td><td id="CallOutcome" class="value">No answer</td></tr>
<tr><td class="key">Email Sent:</td><td id="EmailSent" class="value">No</td></tr>
<tr><td class="key">Do Not Call Requested:</td><td id="DoNotCallRequested" class="value">No</td></tr>
<tr><td class="key">Product Purchased:</td><td id="ProductPurchased" class="value">No</td></tr>
<tr><td class="key">Order Number:</td><td id="OrderNumber" class="value">N/A</td></tr><tr><td class="key">Order Dollar Value:</td><td id="PurchaseAmount" class="value">0.00</td></tr>
<tr><td class="key">Purchased SKUs:</td><td id="PurchasedSKUs" class="value">N/A</td></tr>
</table>
Currently, I have a stored procedure which uses a cursor to iterate over rows in the "live" database table. The procedure cursor looks like this:
OPEN MainNoteCursor;
FETCH NEXT FROM MainNoteCursor INTO @SequenceNumber,@ClientId,@ContactNumber,@UserDisplayName,@CreatorId,@DateCol,@NoteText
WHILE (@@FETCH_STATUS <> -1)
BEGIN
IF (@@FETCH_STATUS <> -2)
BEGIN TRY
DECLARE @NoteDate date = null
DECLARE @DialResult varchar(255) = null
DECLARE @ClientName varchar(255) = null
DECLARE @NumberDialed varchar(255) = null
DECLARE @DialCount int = null
DECLARE @ContactMade varchar(3) = null
DECLARE @CampaignCalledOn varchar(255) = null
DECLARE @CallOutcome varchar(255) = null
DECLARE @EmailSent varchar(3) = null
DECLARE @DoNotCallRequested varchar(3) = null
DECLARE @ProductPurchased varchar(3) = null
DECLARE @OrderNumber varchar(255) = null
DECLARE @PurchaseAmount money = null
DECLARE @PurchasedSKUs varchar(255) = null
SET @NoteDate = CONVERT(date, @DateCol)
SET @DialResult = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="DialResult" class="value">'',''</td>'',1)
SET @ClientName = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="ClientName" class="value">'',''</td>'',1)
SET @NumberDialed = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="NumberDialed" class="value">'',''</td>'',1)
SET @DialCount = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="DialCount" class="value">'',''</td>'',1)
SET @ContactMade = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="ContactMade" class="value">'',''</td>'',1)
SET @CampaignCalledOn = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="CampaignCalledOn" class="value">'',''</td>'',1)
SET @CallOutcome = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="CallOutcome" class="value">'',''</td>'',1)
SET @EmailSent = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="EmailSent" class="value">'',''</td>'',1)
SET @DoNotCallRequested = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="DoNotCallRequested" class="value">'',''</td>'',1)
SET @ProductPurchased = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="ProductPurchased" class="value">'',''</td>'',1)
SET @OrderNumber = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="OrderNumber" class="value">'',''</td>'',1)
SET @PurchaseAmount = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="PurchaseAmount" class="value">'',''</td>'',1)
SET @PurchasedSKUs = [dbo].[FN_PARSE_TEXT] (@NoteText,''<TD id="PurchasedSKUs" class="value">'',''</td>'',1)
INSERT INTO @Return
([SequenceNumber],[ClientId],[ContactNumber],[UserDisplayName],[CreatorId],
[DateCol],[NoteDate],[NoteText],[DialResult],[ClientName],[NumberDialed],[DialCount],
[ContactMade],[CampaignCalledOn],[CallOutcome],[EmailSent],[DoNotCallRequested],
[ProductPurchased],[OrderNumber],[PurchaseAmount],[PurchasedSKUs])
VALUES
(@SequenceNumber,@ClientId,@ContactNumber,@UserDisplayName,@CreatorId,
@DateCol,@NoteDate,@NoteText,@DialResult,@ClientName,@NumberDialed,@DialCount,
@ContactMade,@CampaignCalledOn,@CallOutcome,@EmailSent,@DoNotCallRequested,
@ProductPurchased,@OrderNumber,@PurchaseAmount,@PurchasedSKUs)
END TRY
BEGIN CATCH
--Nothing to do.
END CATCH
FETCH NEXT FROM MainNoteCursor INTO @SequenceNumber,@ClientId,@ContactNumber,@UserDisplayName,@CreatorId,@DateCol,@NoteText
END
CLOSE MainNoteCursor;
DEALLOCATE MainNoteCursor;
SELECT * FROM @Return
So, you can see that the cursor is parsing the full note body each time, extracting the values between two string delimiters, inserting the values into a table, and then returning the table.
When executed in SSMS, I get a result that looks like this:
The code in the FN_PARSE_TEXT uses CHARINDEX(), LEN(), SUBSTRING(), etc. to read the text between the "Start string" and the "end string". I won't post the full function because there's a lot of housekeeping code that's not relevant, but it can be distilled down to this:
--********************************************************************************
-- CHARINDEX returns the position of the fist character in @StartKey, but we need to
-- start reading after the *last* character in @StartKey. Re-adjust the position
-- at which we''ll start reading the string.
--********************************************************************************
SET @StartKeyIndex = @StartKeyIndex + @StartKeyLength
SET @ReadLength = @EndKeyIndex - @StartKeyIndex
--********************************************************************************
-- Start / End positions have been determined, so now read out the calculated number
-- of characters and return to the calling code.
--********************************************************************************
SET @ReturnValue = LTRIM(RTRIM(SUBSTRING(@noteText, @StartKeyIndex, @ReadLength)))
So, after all of that background information (I can provide more detail, if needed), I'm looking for a better approach.
I've thought of creating an overnight batch process that does all of the parsing during off-hours and then dumping the data into a flattened table. I would then report of the flattened data instead of trying to parse out the details in real time. This, of course, opens its own can of worms and I'd ideally like to continue to hit the live data. Depending on the date range the users supply when running the query, upwards of 10,000 rows may be returned.
Another alternative I've thought of would be to use a CLR method to handle the looping and parsing - but I don't have a lot of experience there and I'm not sure that would be any more efficient than what I'm doing now.
I have read articles (https://sqlperformance.com/2012/07/t-sql-queries/split-strings - for example) that talk about things like CTE, but I don't really have much experience there, so I'm not sure about how to make the leap. Most of the "string parsing" articles I've read are geared towards scenarios where the delimiter is the same - things like parsing comma-separated strings. But since I have "rolling" delimiters, I'm not sure how to deal with that.
Anyway, if you're still with me, any input would be greatly appreciated.
Upvotes: 1
Views: 1810
Reputation: 1080
You can use XML to convert your values and after pivot the result
DECLARE @xml xml = '<table>
<tr>
<td class="key">Dial Result:</td><td id="DialResult" class="value">No Answer</td></tr>
<tr><td class="key">Client Name:</td><td id="ClientName" class="value">SMITH, BOB</td></tr>
<tr><td class="key">Number Dialed:</td><td id="NumberDialed" class="value">5555555555 [Day]</td></tr>
<tr><td class="key">Dial Count:</td><td id="DialCount" class="value">1</td></tr><tr><td class="key">Contact Made:</td><td id="ContactMade" class="value">No</td></tr>
<tr><td class="key">Campaign Called On:</td><td id="CampaignCalledOn" class="value">TEST CAMPAIGN</td></tr>
<tr><td class="key">Call Outcome:</td><td id="CallOutcome" class="value">No answer</td></tr>
<tr><td class="key">Email Sent:</td><td id="EmailSent" class="value">No</td></tr>
<tr><td class="key">Do Not Call Requested:</td><td id="DoNotCallRequested" class="value">No</td></tr>
<tr><td class="key">Product Purchased:</td><td id="ProductPurchased" class="value">No</td></tr>
<tr><td class="key">Order Number:</td><td id="OrderNumber" class="value">N/A</td></tr><tr><td class="key">Order Dollar Value:</td><td id="PurchaseAmount" class="value">0.00</td></tr>
<tr><td class="key">Purchased SKUs:</td><td id="PurchasedSKUs" class="value">N/A</td></tr>
</table>'
Select
Tbl.Col.value('@id','varchar(max)') Id
,Tbl.Col.value('.','varchar(max)') Value
FRom
@xml.nodes('/table/tr/td[@class="value"]') Tbl(Col)
SELECT
DialResult
,ClientName
,NumberDialed
,DialCount
,ContactMade
,CampaignCalledOn
,CallOutcome
,EmailSent
,DoNotCallRequested
,ProductPurchased
,OrderNumber
,PurchaseAmount
,PurchasedSKUs
FROM
(
Select
Tbl.Col.value('@id','varchar(max)') Id
,Tbl.Col.value('.','varchar(max)') Value
FRom
@xml.nodes('/table/tr/td[@class="value"]') Tbl(Col) ) T Pivot
(Max(Value) FOR Id In(DialResult
,ClientName
,NumberDialed
,DialCount
,ContactMade
,CampaignCalledOn
,CallOutcome
,EmailSent
,DoNotCallRequested
,ProductPurchased
,OrderNumber
,PurchaseAmount
,PurchasedSKUs)) p;
Result for first select
Id Value
-------------------------------------------------- --------------------------------------------------
DialResult No Answer
ClientName SMITH, BOB
NumberDialed 5555555555 [Day]
DialCount 1
ContactMade No
CampaignCalledOn TEST CAMPAIGN
CallOutcome No answer
EmailSent No
DoNotCallRequested No
ProductPurchased No
OrderNumber N/A
PurchaseAmount 0.00
PurchasedSKUs N/A
Result for second select
DialResult ClientName NumberDialed DialCount ContactMade CampaignCalledOn CallOutcome EmailSent DoNotCallRequested ProductPurchased OrderNumber PurchaseAmount PurchasedSKUs
-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------
No Answer SMITH, BOB 5555555555 [Day] 1 No TEST CAMPAIGN No answer No No No N/A 0.00 N/A
(1 row(s) affected)
Upvotes: 0
Reputation: 4695
Just to offer a tangible example of how you can do this with XML, here's how I'd do it. I only have one sample document to go off of here, but if you had multiple, the same would apply. You'd just get back multiple rows for each primary key.
;with src (Id, NoteText) as
(
select 1, cast('<table>
<tr>
<td class="key">Dial Result:</td><td id="DialResult" class="value">No Answer</td></tr>
<tr><td class="key">Client Name:</td><td id="ClientName" class="value">SMITH, BOB</td></tr>
<tr><td class="key">Number Dialed:</td><td id="NumberDialed" class="value">5555555555 [Day]</td></tr>
<tr><td class="key">Dial Count:</td><td id="DialCount" class="value">1</td></tr><tr><td class="key">Contact Made:</td><td id="ContactMade" class="value">No</td></tr>
<tr><td class="key">Campaign Called On:</td><td id="CampaignCalledOn" class="value">TEST CAMPAIGN</td></tr>
<tr><td class="key">Call Outcome:</td><td id="CallOutcome" class="value">No answer</td></tr>
<tr><td class="key">Email Sent:</td><td id="EmailSent" class="value">No</td></tr>
<tr><td class="key">Do Not Call Requested:</td><td id="DoNotCallRequested" class="value">No</td></tr>
<tr><td class="key">Product Purchased:</td><td id="ProductPurchased" class="value">No</td></tr>
<tr><td class="key">Order Number:</td><td id="OrderNumber" class="value">N/A</td></tr><tr><td class="key">Order Dollar Value:</td><td id="PurchaseAmount" class="value">0.00</td></tr>
<tr><td class="key">Purchased SKUs:</td><td id="PurchasedSKUs" class="value">N/A</td></tr>
</table>' as xml)
)
select
a.*,
t.c.value('td[1]', 'varchar(max)')
from src a
cross apply a.NoteText.nodes('table/tr') as t(c)
Documentation for XML shredding in SQL can be a little hit or miss, mostly because SQL is implementing xquery which is independent of both SQL and Microsoft. That said, for doing basic stuff, generally the MSDN documentation gets you started pretty well.
Here are a couple good articles
Upvotes: 2
Reputation: 31785
If you can make modifications to the database, I would do anything I could to avoid doing this parsing at query time. Especially if this is a query that if performed often by users for something like a CRM application.
I would either add all of these frequently-selected columns to my table, or I would create a separate table for them that can be linked to the primary key of this table.
Then, ideally, I would populate these columns at INSERT time. Preferably by having the INSERT go through a stored procedure, but if necessary, by putting an Insert Trigger on the table.
If I can't do it at INSERT time, then I'd run a job that does the parsing and populates the columns as often as practicality requires/allows.
With most CRM applications, SELECT performance is more critical than INSERT performance, because it impacts multiple rows, while INSERTS usually happen one at a time. So I would try to shift the performance burden there if at all possible.
DISCLAIMER: I've never played with the XML functions that Xedni references in his comments. So I cannot say what the performance difference would be between my suggestion and his.
Upvotes: 1