Reputation: 526
we are currently involved in a data migration project, for which we have to analyze data usage, meaning we need to figure out which tables and columns are used and need to be migrated and which data is obsolete. We not only have many SSRS reports based on stored procedures, we also have hundreds if not thousands of TSQL scripts used for ad hoc analyzes. Analyzing them one by one by hand is virtually impossible.
I looked for a way to parse these scripts and return the data used by the scripts. I managed to write a macro in EditPad which extracts tables from a script, but I failed to do the same for columns. The main problems here are aliases, CTEs and even distinguishing column names in the script from other commands.
In contrast to a simple regex based macro, SQL Server must have a clear idea which columns are used by the script -> Execution plan. Is there any way we could use this capability for our purposes? Or even better: is there any third party tool that could do what we need? So far, I could not find anything.
Thx very much for any advice!
Upvotes: 4
Views: 3820
Reputation: 526
A colleague of mine had the clever idea of parsing the execution plan using an XML query:
The execution plan has to be saved as XML and then filtered on a website to reduce the depth/number of levels, which must not be larger than 128:
http://xmltoolbox.appspot.com/
1. Paste the XML
2. Add Column Reference as a filter
3. Format xml
4. Save it again as flatfile
The filtered XML can be read and processed in SQL:
DECLARE @xml xml = (
SELECT CAST(BulkColumn AS XML) FROM OPENROWSET(
BULK 'c:\temp\Herkunftsselect_filtered.xml',
SINGLE_BLOB) AS ExecPlan
);
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS SP)
SELECT DISTINCT
'Database' = n.xml.value('./@Database','nvarchar(100)'),
'Schema' = n.xml.value('./@Schema','nvarchar(100)'),
'Tabelle' = n.xml.value('./@Table','nvarchar(100)'),
'Alias' = n.xml.value('./@Alias','nvarchar(100)'),
'Column' = n.xml.value('./@Column','nvarchar(100)')
FROM @xml.nodes('/Root/SP:ColumnReference') n(xml)
WHERE n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'Expr%'
AND n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'Chk%'
AND n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'Bitmap%'
AND n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'IsBaseRow%'
AND n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'Union%'
AND n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'Segment%'
ORDER BY 1,2,3,4,5
Now, the only missing part is how to fully automate the process to loop through the script files, generate the execution plan, filter the xml and run the query. A way my colleague considered might be to merge all the script files into one big file (looping through the files and append them), so that the manual process would have to be done only once.
Upvotes: 1
Reputation: 21505
A partial answer:
Based on this article, it's possible to use PowerShell (or a compiled .Net language) to use the Microsoft.SqlServer.Management.SqlParser
to generate the parse tree for a SQL statement as used by Microsoft tools.
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Management.SqlParser") | Out-Null
$ParseOptions = New-Object Microsoft.SqlServer.Management.SqlParser.Parser.ParseOptions
$ParseOptions.BatchSeparator = 'GO'
$ParseOptions.CompatibilityLevel = [Microsoft.SqlServer.Management.SqlParser.Common.DatabaseCompatibilityLevel]::Current
$ParseOptions.IsQuotedIdentifierSet = $true
$ParseOptions.TransactSqlVersion= [Microsoft.SqlServer.Management.SqlParser.Common.TransactSqlVersion]::Current
set-psdebug -strict
#$SQL=get-content $Args[0] |foreach-object{"$_`n"}
$SQL = "SELECT c.COLUMN_NAME,
c.TABLE_NAME,
t.TABLE_SCHEMA,
t.TABLE_TYPE,
t.TABLE_NAME AS t2
FROM INFORMATION_SCHEMA.TABLES AS t
JOIN INFORMATION_SCHEMA.COLUMNS AS c
ON c.TABLE_NAME = t.TABLE_NAME
AND c.TABLE_SCHEMA = t.TABLE_SCHEMA
AND c.TABLE_CATALOG = t.TABLE_CATALOG
"
$Script=[Microsoft.SqlServer.Management.SqlParser.Parser.Parser]::Parse($SQL, $ParseOptions)
$flags = [System.Reflection.BindingFlags]::Instance -bor [System.Reflection.BindingFlags]::NonPublic
$Script.GetType().GetProperty("Script", $flags).GetValue($Script).Xml
I have a local version of SSMS 2016 and SQL Server 2014 installed, and this worked without installing anything else.
This still leaves quite a lot of work to do in terms of interpreting the tree, but you might be able to make use of it.
Upvotes: 5
Reputation: 40359
Parsing out tables and columns from execution plans may be possible, but it can’t be simple to do. (I’ll be watching this post in case someone spells it out.) Third party tools might to do this, another thing to watch for. From my experience, I’m thinking this may not be feasible, not to the full extent that you need.
A lateral approach: can you arrange to “run everything” against the database, so that you know all possible attempts to access your data have been hit? If so:
The thing is… having written that, I see so many potential caveats, fringe cases and gotchas (that totally depend upon what you have to work with—what about views? Triggers? Synonyms?) that I have to question whether it’s worth the effort. If you’re certain you’ll cut out half the database, go for it, but for a 10% reduction it might not be worth the effort. (For a 10% reduction, try just renaming the most suspect tables and see what happens.)
Upvotes: 3