Pesche Helfer
Pesche Helfer

Reputation: 526

Parse TSQL scripts and return table and column usage

we are currently involved in a data migration project, for which we have to analyze data usage, meaning we need to figure out which tables and columns are used and need to be migrated and which data is obsolete. We not only have many SSRS reports based on stored procedures, we also have hundreds if not thousands of TSQL scripts used for ad hoc analyzes. Analyzing them one by one by hand is virtually impossible.

I looked for a way to parse these scripts and return the data used by the scripts. I managed to write a macro in EditPad which extracts tables from a script, but I failed to do the same for columns. The main problems here are aliases, CTEs and even distinguishing column names in the script from other commands.

In contrast to a simple regex based macro, SQL Server must have a clear idea which columns are used by the script -> Execution plan. Is there any way we could use this capability for our purposes? Or even better: is there any third party tool that could do what we need? So far, I could not find anything.

Thx very much for any advice!

Upvotes: 4

Views: 3820

Answers (3)

Pesche Helfer
Pesche Helfer

Reputation: 526

A colleague of mine had the clever idea of parsing the execution plan using an XML query:

The execution plan has to be saved as XML and then filtered on a website to reduce the depth/number of levels, which must not be larger than 128:

http://xmltoolbox.appspot.com/

1. Paste the XML
2. Add Column Reference as a filter
3. Format xml
4. Save it again as flatfile

The filtered XML can be read and processed in SQL:

DECLARE @xml xml = (
       SELECT CAST(BulkColumn AS XML) FROM OPENROWSET(  
       BULK 'c:\temp\Herkunftsselect_filtered.xml',  
       SINGLE_BLOB) AS ExecPlan 
   );

WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS SP)
SELECT DISTINCT
       'Database' = n.xml.value('./@Database','nvarchar(100)'),
       'Schema' =          n.xml.value('./@Schema','nvarchar(100)'),
       'Tabelle' =         n.xml.value('./@Table','nvarchar(100)'),
       'Alias' =           n.xml.value('./@Alias','nvarchar(100)'),
       'Column' =          n.xml.value('./@Column','nvarchar(100)')
FROM @xml.nodes('/Root/SP:ColumnReference') n(xml)
WHERE n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'Expr%' 
  AND n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'Chk%'
  AND n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'Bitmap%'
  AND n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'IsBaseRow%'
  AND n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'Union%'
  AND n.xml.value('./@Column','nvarchar(100)') NOT LIKE 'Segment%'

ORDER BY 1,2,3,4,5

Now, the only missing part is how to fully automate the process to loop through the script files, generate the execution plan, filter the xml and run the query. A way my colleague considered might be to merge all the script files into one big file (looping through the files and append them), so that the manual process would have to be done only once.

Upvotes: 1

Ed Harper
Ed Harper

Reputation: 21505

A partial answer:

Based on this article, it's possible to use PowerShell (or a compiled .Net language) to use the Microsoft.SqlServer.Management.SqlParser to generate the parse tree for a SQL statement as used by Microsoft tools.

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Management.SqlParser") | Out-Null
$ParseOptions = New-Object Microsoft.SqlServer.Management.SqlParser.Parser.ParseOptions
$ParseOptions.BatchSeparator = 'GO' 
$ParseOptions.CompatibilityLevel = [Microsoft.SqlServer.Management.SqlParser.Common.DatabaseCompatibilityLevel]::Current
$ParseOptions.IsQuotedIdentifierSet = $true
$ParseOptions.TransactSqlVersion= [Microsoft.SqlServer.Management.SqlParser.Common.TransactSqlVersion]::Current

set-psdebug -strict

#$SQL=get-content $Args[0] |foreach-object{"$_`n"} 

$SQL = "SELECT  c.COLUMN_NAME,
        c.TABLE_NAME, 
        t.TABLE_SCHEMA,
        t.TABLE_TYPE,
        t.TABLE_NAME AS t2
FROM INFORMATION_SCHEMA.TABLES AS t
JOIN INFORMATION_SCHEMA.COLUMNS AS c
ON c.TABLE_NAME = t.TABLE_NAME
AND c.TABLE_SCHEMA = t.TABLE_SCHEMA
AND c.TABLE_CATALOG = t.TABLE_CATALOG
"

$Script=[Microsoft.SqlServer.Management.SqlParser.Parser.Parser]::Parse($SQL, $ParseOptions)

$flags = [System.Reflection.BindingFlags]::Instance -bor [System.Reflection.BindingFlags]::NonPublic
$Script.GetType().GetProperty("Script", $flags).GetValue($Script).Xml

I have a local version of SSMS 2016 and SQL Server 2014 installed, and this worked without installing anything else.

This still leaves quite a lot of work to do in terms of interpreting the tree, but you might be able to make use of it.

Upvotes: 5

Philip Kelley
Philip Kelley

Reputation: 40359

Parsing out tables and columns from execution plans may be possible, but it can’t be simple to do. (I’ll be watching this post in case someone spells it out.) Third party tools might to do this, another thing to watch for. From my experience, I’m thinking this may not be feasible, not to the full extent that you need.

A lateral approach: can you arrange to “run everything” against the database, so that you know all possible attempts to access your data have been hit? If so:

  • Create a copy of the database.
  • Only include as little data as necessary for the queries to work.
  • (So, either backup/restore, or use SSMS to “script out" the database)
  • Configure security such that the login used can access nothing
  • Run the first query. It will fail. Determine what tables and column it needs access to. Grant access to only those tables and columns.
  • Run the next query. Repeat.
  • Odds are you can run batches of queries and add tables/columns en masse
  • Save time by enabling the known/obvious tables first
  • Once all queries can be successfully run because all the relevant tables and columns are enabled, you have your minimal set.

The thing is… having written that, I see so many potential caveats, fringe cases and gotchas (that totally depend upon what you have to work with—what about views? Triggers? Synonyms?) that I have to question whether it’s worth the effort. If you’re certain you’ll cut out half the database, go for it, but for a 10% reduction it might not be worth the effort. (For a 10% reduction, try just renaming the most suspect tables and see what happens.)

Upvotes: 3

Related Questions