Jim the Frayed
Jim the Frayed

Reputation: 158

Identifying T-SQL Reserved/Lexical Words in TSql110Parser parse tree

I am using the TSql110Parser to list Identifier and QuotedIdentifier tokens in a particular T-SQL procedure (and all modules, eventually). The mission is to assemble a list of column(s) and their table(s) that are used in each module so we can identify obsolete columns, tables and modules in the database on a scripted basis.

I have validated that no user objects in my database use currently-defined T-SQL reserved words as object names.

The Question: is there a way to detect when an Identifier is part of the T-SQL lexicon? That is, from the information contained in tree.ScriptTokenStream[idx], can it be determined whether or not tree.ScriptTokenStream[idx].Text is a T-SQL lexical word? Reserved words appear to all have a `tree.ScriptTokenStream[idx].TokenType' of something other than "Identifier", so they're already excluded. The words I want eliminate are words like "NOCOUNT", "VARCHAR", "LTRIM" and "RTRIM", which all have the TokenType "Identifier".

Related Bonus Question: This list currently outputs to the console. Any tips for rerouting the tokens output to CRLF-delimited text in a SQL Server CLR?

I am a complete newbie with C#, but have been coding for a long time, so any help you could provide needs to be pretty low-level from a C# syntax perspective, please.

My environment is SQL Server 2008R2 / C# in VS10 / .NET 4.5.2.

Thanks for your attention and help.

Here's the program:

  // Program.cs
  // Cloned and kludged from Program.cs, found in the BasicUsage project, available from the Samples download link in 
  //   http://blogs.msdn.com/b/arvindsh/archive/2013/11/06/slides-and-samples-for-my-sql-pass-scriptdom-talk.aspx
  //
  // See also: http://michaeljswart.com/2014/04/removing-comments-from-sql/
  // 
  // The page that VS10 "Help" should reference instead of the useless one it does:
  // https://msdn.microsoft.com/en-us/library/kx37x362(v=vs.100).aspx


  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Text;
  using System.IO;
  using Microsoft.SqlServer.TransactSql.ScriptDom;

  namespace BasicUsage
  {
      class Program
      {
          static void Main(string[] args)
          {
              // before proceeding, add a reference to the ScriptDom assembly

              IList<ParseError> errors = null;

              //TextReader rdr = new StreamReader(@"C:\ScriptDom\SampleProc.sql");
              TextReader rdr = new StreamReader(@"C:\ScriptDom\pTestProc.sql");

              // pass the reader to the scriptdom
              TSql110Parser parser = new TSql110Parser(true);
              TSqlFragment tree = parser.Parse(rdr, out errors);

              // some placeholders to avoid typing!
              foreach (ParseError err in errors)
              {
                  Console.WriteLine(err.Message);
              }

              string strtokentype ;
              string strtokentext ;
              int strtokentextlen ;
              int tokencount;
              int identifiercount = 0;

              tokencount = tree.ScriptTokenStream.Count;  // 249

              Console.WriteLine(tree.ScriptTokenStream.Count);
              Console.WriteLine("====== Listing only Identifiers ======");

              // walk through the tokens
              int idx = 0;
              for (idx=0; idx < tokencount; idx++ )  // remember: zero-based arrays here.
              {
                  //if (String.Equals(tree.ScriptTokenStream[idx].TokenType, "QuotedIdentifier", StringComparison.OrdinalIgnoreCase) = true ||
                  //    String.Equals(tree.ScriptTokenStream[idx].TokenType, "Identifier", StringComparison.OrdinalIgnoreCase) = true)

                  // Make string ops easier by doing the conversion only once, and operating on a string basis thereafter.
                  strtokentype = Convert.ToString(tree.ScriptTokenStream[idx].TokenType);


                  // if it's a quoted identifier, remove the first and last character, e.g. "[Rate]" becomes "Rate".
                  if (strtokentype == "QuotedIdentifier" ||
                      strtokentype == "Identifier"
                     )
                  {
                      identifiercount++;

                      // do the conversion first,
                      strtokentext =  Convert.ToString(tree.ScriptTokenStream[idx].Text);

                      // then extract the meaningful part if needed.
                      if (strtokentype == "QuotedIdentifier")
                      {
                          strtokentextlen = strtokentext.Length - 2;
                          strtokentext = strtokentext.Substring(1, strtokentextlen);
                      }
                      else
                      {
                          strtokentype = "      Identifier";  // Provide leading spaces to clean up the output text for a nicer presentation.
                      }

                      Console.Write("[" + idx + "] = " + strtokentype);
                      Console.WriteLine("  " + strtokentext);
                      Console.WriteLine();

                  }
              };
              Console.WriteLine();
              Console.WriteLine(identifiercount + "  Identifiers found.");
              Console.WriteLine();

              rdr.Dispose();  // Set breakpoint here so console remains visible during development
          }
      }
  }

Upvotes: 1

Views: 588

Answers (1)

Jim the Frayed
Jim the Frayed

Reputation: 158

Turns out that the consensus among the MVP's I've asked is that there really isn't a comprehensive list of T-SQL keywords available.

However, an XML file with many of the keywords is created when you install SSMS. The file lives on your machine at
      C:\Program Files (x86)\Microsoft SQL Server\120\Tools\Binn\ManagementStudio\SqlToolsData\1033\SqlCommonObjects.xml

Those, combined with the reserved words (see the link in the question above) compose a list of over 600 distinct words [ODBC and Future use words included] that do or could have "special meaning" to T-SQL.

As an aside: Note that keywords differ from reserved words in that keywords can legally, if unwisely, be used to create valid DDL, e.g.
      CREATE TABLE varchar(varchar varchar NOT NULL)
Perfectly valid, but over the top in obfuscation.

Upvotes: 1

Related Questions