John Dibling
John Dibling

Reputation: 101484

Is LTRIM(RTRIM(COALESCE(TextField,''))) Bad?

I have a very high-traffic table with a char(50) field which takes part in several indexes. This char(50) field allows NULLS, and in that case a NULL value is considered to be the same as a non-NULL, zero-length string for my purposes.

I also disregard leading & trailing whitespace and, while I scrub the data before I insert it, it may also be inserted by means beyond my control.

I have a sproc that is used to copy data from one table to the main table, and it needs to be high-performance. I need to delete duplicate records before inserting the new data and I am using the method discussed in this thread to perform the deletes.

My delete statement looks like this (simplified):

delete masterTable
from masterTable t
    join incomingDataTable inc on
    (
        LTRIM(RTRIM(COALESCE(inc.TextField,''))) = 
             LTRIM(RTRIM(COALESCE(t.TextField,'')))
    )
where LTRIM(RTRIM(COALESCE(t.TextField,''))) <> ''  

I have read that constructs like LTRIM(RTRIM(...)) are bad. Can my delete statement be improved, and if so, how?

EDIT: Just to clarify, TextField does take part in indexes on both tables. EDIT2: TextField is defined as char(50) in both tables. It is not of type TEXT.

Upvotes: 1

Views: 7409

Answers (6)

SQLMenace
SQLMenace

Reputation: 135111

It is bad because your JOIN will have to scan the whole index, this is because your condition is not SARGable

Are you also sure is is a TEXT datatype? Last I checked you could not use LTRIM or RTRIM against a Text datatype column?

in response to the char against varchar comment, run this

declare @v varchar(50),@v2 char(50)
select @v ='a',@v2 = 'a'

select datalength(@v),datalength(@v2)

Upvotes: 4

HLGEM
HLGEM

Reputation: 96600

If you have to trim the data everytime you use it, this should NOT be a char datatype but a varchar datatype. Anytime you have to use a function everytime you query a field, something is wrong with your database design.

Youmight find this discussion helpful: Is the CHAR datatype in SQL obsolete? When do you use it?

Upvotes: 2

marc_s
marc_s

Reputation: 755013

I would recommend changing that datatype to VARCHAR(50) - up to about 10 characters, the CHAR(x) might make sense since it's a tad faster and has less overhead - but with 50 characters, unless all the columns are fully used up to 50 chars, this is a major overhead, especially since this column is also used in indices.

Changing it to VARCHAR(50) would possibly quite significantly reduce the space needed for the table (depending on your amount of data and how much of the 50 chars are really used), and all the indices involved would also get a lot smaller - plus you wouldn't need this COALESCE, LTRIM, RTRIM stuff anymore :-)

Marc

Upvotes: 3

Quassnoi
Quassnoi

Reputation: 425643

You need to:

  1. Create a computed column on masterTable using expression LTRIM(RTRIM(COALESCE(TextField,'')))
    • Build an index on this column and
    • Use this column in a join.

The way your table is designed now it's quite impossible to make this query index-friendly.

If you cannot change your table structure but can estimate the number of LEADING spaces, you may use an approach described here.

This solution, however, is far not as efficient as creating an index on a computed column.

Upvotes: 8

beach
beach

Reputation: 8640

I believe that SQLMenace is correct.

How about adding an INSERT/UPDATE trigger to the table to guarantee that there is no whitespace on that column?

IF the column is VARCHAR, SQL Server will automatically ignore ending whitespace. Leading whitespace still counts though.

Actually, wouldn't SQL Server automatically pad both columns to CHAR(50) before doing the JOIN? (Implict conversions.)

Upvotes: 2

cjk
cjk

Reputation: 46465

I think in SQL Server you can match a padded string with a non-padded string, saving yourself the effort on the LTRIM/RTRIM, however I'm not 100% on that...

However, tidying data is all part of ETL and needs to be done before your data gets to where it is going. You may find on large datasets that it is quicker to create a temporary copy of the data, reprocess it, index it, then do the required matching.

Upvotes: 2

Related Questions