KJQ
KJQ

Reputation: 59

Searching SQL Server database for control characters

I'm looking to search for unwanted control characters in a MSSQL database.

I currently use a stored procedure that gets created against a database I need to search, but this will only work when searching for a simple character or string of characters. See below for the procedure as it stands (This was first gathered from this site)

CREATE PROC SearchAllTables
(
@SearchStr nvarchar(100)
)
AS
BEGIN

-- Creates a Stored Procedure for the database
-- When running the procedure, set the @SearchStr parameter to the character you are searching for

CREATE TABLE #Results (ColumnName nvarchar(370), ColumnValue nvarchar(3630))

SET NOCOUNT ON

DECLARE @TableName nvarchar(256), @ColumnName nvarchar(128), @SearchStr2 nvarchar(110)
SET  @TableName = ''
SET @SearchStr2 = QUOTENAME('%' + @SearchStr + '%','''')

WHILE @TableName IS NOT NULL
BEGIN
SET @ColumnName = ''
SET @TableName = 
(
    SELECT MIN(QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME))
    FROM    INFORMATION_SCHEMA.TABLES
    WHERE       TABLE_TYPE = 'BASE TABLE'
        AND QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME) > @TableName
        AND OBJECTPROPERTY(
                OBJECT_ID(
                    QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME)
                     ), 'IsMSShipped'
                       ) = 0
)

WHILE (@TableName IS NOT NULL) AND (@ColumnName IS NOT NULL)
BEGIN
    SET @ColumnName =
    (
        SELECT MIN(QUOTENAME(COLUMN_NAME))
        FROM    INFORMATION_SCHEMA.COLUMNS
        WHERE       TABLE_SCHEMA    = PARSENAME(@TableName, 2)
            AND TABLE_NAME  = PARSENAME(@TableName, 1)
            AND DATA_TYPE IN ('char', 'varchar', 'nchar', 'nvarchar')
            AND QUOTENAME(COLUMN_NAME) > @ColumnName
    )

    IF @ColumnName IS NOT NULL
    BEGIN
        INSERT INTO #Results
        EXEC
        (
            'SELECT ''' + @TableName + '.' + @ColumnName + ''', LEFT(' + @ColumnName + ', 3630) 
            FROM ' + @TableName + ' (NOLOCK) ' +
            ' WHERE ' + @ColumnName + ' LIKE ' + @SearchStr2
        )
END
END 
END

SELECT ColumnName, ColumnValue FROM #Results
END

Now, I need to alter this to allow me to search for a list of control characters:

    '%['
                      + CHAR(0)+CHAR(1)+CHAR(2)+CHAR(3)+CHAR(4)
                      + CHAR(5)+CHAR(6)+CHAR(7)+CHAR(8)+CHAR(9)
                      + CHAR(10)+CHAR(11)+CHAR(12)+CHAR(13)+CHAR(14)
                      + CHAR(15)+CHAR(16)+CHAR(17)+CHAR(18)+CHAR(19)
                      + CHAR(20)+CHAR(21)+CHAR(22)+CHAR(23)+CHAR(24)
                      + CHAR(25)+CHAR(26)+CHAR(27)+CHAR(28)+CHAR(29)
                      + CHAR(30)+CHAR(31)+CHAR(127)
                      + ']%',

Now the procedure as it stands won't allow me to use this as a search string, and it won't search correctly even using a single control character e.g. CHAR (28)

USE [DBNAME]
GO

DECLARE @return_value int

EXEC    @return_value = [dbo].[SearchAllTables]
        @SearchStr = N'CHAR (28)'

SELECT  'Return Value' = @return_value

GO

Removing the N'' from the @SearchStr in the example above results in the error message:

Incorrect syntax near '28'

Can anyone help with a way of adapting this procedure to allow the search of control characters?

Upvotes: 1

Views: 2126

Answers (3)

clifton_h
clifton_h

Reputation: 1298

Concerns:

  • Performance

As you probably know, wildcards (%) at the beginning and end of the argument prevent your SARG from using any indexes at all (even if it claims to use an INDEX SCAN) as SQL Server has no idea where the values will be. In a worst case scenario, it might even look in the wrong areas!

More grievous, the last EXEC statement you fire off will make SQL Server run through hoops. Despite what you might think, SQL Server initializes variables at the time of execution. Meaning, the optimizer will be running with its bed-clothes still on while it is in the middle of executing the query plan and may end up changing several times!

An example of what might be unleashed occurred on one of my DBs a month ago, where a terrible new plugin ran a simple query looking for one row with just two badly parameterized predicates on a large table of 1 million rows. Yet, the Optimizer swallowed up trillions of IOs in a matter of seconds (the query came and went too fast for a governor) and sent 2 billion rows PER QUERY through the network. Tragically, the issue was zombied that day, and with just 500 one-row result sets in my database running repeatedly, it brought down our server.

  • Isolation and Transactions

Guessing haphazardly, expect to have locking issues and swallowed up resources. Major operations like UPDATES, and REINDEXING, and ALTER statements will either be forced to wait or kick your query to the curbside. Even using READ UNCOMMITTED will not save you from some blocking issues.

A New Approach

All of those characters you listed are neither letters nor numbers, but meaningless garbage (to SQL Server) that flows in from a front end application. I noticed you excluded Microsoft System Tables, so where does your data flow come from and how is it disseminated throughout the database? Who is at fault? How does the system, user, and designer play a role in the mess?

Is this Server an OLTP or READ heavy? Does your org not have a capable SSIS, ETL system to prevent garbage from wreaking havoc on your server?

  • Database Constraints

Assuredly, what reason does your application fail to pre-cleanse the data before sending it? And when it does get to the database level, why can we not use both the DATA TYPE and TABLE CONSTRAINTS to our advantage? Simple solutions like using DATE instead of VARCHAR for storing dates, adding normalization instead of storing blobs to isolate the read-heavy tables from write-heavy can spell wonders to improvement.

Admittingly, using CHECK CONSTRAINTS can lead to an exponential degradation of performance on your INSERT statements, so you may need to think about the larger impact.

Preventative vs Prescriptive

Ostensibly, I could write a query that would solve your current question (encapsulating EXEC statements in another Stored Proc enables proper parameter sniffing), we need to ask more and write less code. Your Procedure is terrible now and will always be, even if we window-dress. It masks the real issue of how those control characters got there in the first place and forces expensive queries on your poor system.

How relationally your tables work, normalization, cardinality should mean something to you so you can discriminate between not only types of tables but those specific columns they possess. Your current trouble would be disastrous on many of my databases, which can reach 1.5+ Terabytes in size

The more you gather your requirements, the better your answer will be. Heck, even setting up a database entirely for ETL would be superior than your current solution. And even if you still end up running a similar query, at least you will have shortened your list of columns and tables to a minute, understandable list instead of blindly inflicting pain on everyone in your company.

Best of wishes!

Upvotes: 1

John Cappelletti
John Cappelletti

Reputation: 81970

I would opt for a dynamic CharIndex(). Consider the following

Declare @ColumnName varchar(25)='[SomeField]'
Declare @SearchFor  nvarchar(max) ='CHAR(0),CHAR(1),CHAR(2),CHAR(3),CHAR(4),CHAR(5),CHAR(6),CHAR(7),CHAR(8),CHAR(9),CHAR(10),CHAR(11),CHAR(12),CHAR(13),CHAR(14),CHAR(15),CHAR(16),CHAR(17),CHAR(18),CHAR(19),CHAR(20),CHAR(21),CHAR(22),CHAR(23),CHAR(24),CHAR(25),CHAR(26),CHAR(27),CHAR(28),CHAR(29),CHAR(30),CHAR(31),CHAR(127)'
Set @SearchFor = 'CharIndex('+Replace(@SearchFor,',',','+@ColumnName+')+CharIndex(')+','+@ColumnName+')'

So Your Dynamic where would look something like this

' WHERE ' + @SearchFor + '>0'

Just for illustration, the @SearchFor string would look like this

CharIndex(CHAR(0),[SomeField])+CharIndex(CHAR(1),[SomeField])+...+CharIndex(CHAR(31),[SomeField])+CharIndex(CHAR(127),[SomeField])

Upvotes: 2

iamdave
iamdave

Reputation: 12243

It looks like QUOTENAME is what is breaking things for you. When you try to use certain characters - such as char(0) - it returns NULL. Because of this, you are probably better off manually putting the single quotes yourself.

This means you would want to change this part:

    INSERT INTO #Results
    EXEC
    (
        'SELECT ''' + @TableName + '.' + @ColumnName + ''', LEFT(' + @ColumnName + ', 3630) 
        FROM ' + @TableName + ' (NOLOCK) ' +
        ' WHERE ' + @ColumnName + ' LIKE ' + @SearchStr2
    )

to this:

    INSERT INTO #Results
    EXEC
    (
        'SELECT ''' + @TableName + '.' + @ColumnName + ''', LEFT(' + @ColumnName + ', 3630) 
        FROM ' + @TableName + ' (NOLOCK) ' +
        ' WHERE ' + @ColumnName + ' LIKE ''' + @SearchStr + '''  -- Note the use of @SearchStr (Not @SearchStr2) and the additional quotes to wrap your search string in. 
    )

Which should allow you to use your %[...]% pattern matching syntax.

Upvotes: 1

Related Questions