Yaqub Ahmad
Yaqub Ahmad

Reputation: 27659

Find a string within a string - SQL takes too much time

I have two tables:

Table 1: CustomerEmails:

CREATE TABLE [dbo].[CustomerEmails](
    [id]            [int] IDENTITY(1,1) NOT NULL,
    [datecreated]   [datetime] NULL,
    [UID]           [nvarchar](250) NULL,
    [From]          [nvarchar](100) NULL,
    [To]            [nvarchar](100) NULL,
    [Subject]       [nvarchar](max) NULL,
    [Body]          [nvarchar](max) NULL,
    [Dated]         [datetime] NULL,
 CONSTRAINT [PK_CustomerEmails] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
)

Table 2: CustomerEmailIds:

CREATE TABLE [dbo].[CustomerEmailIds](
    [Email] [varchar](200) NULL
) ON [PRIMARY]

You may be thinking that why i am not using the EmailId in table CustomerEmails instead of using the emails itself? what i mean is that i can add a column EmailId (INT) to the table CustomerEmailIds & then i can refer that column instead of CustomerEmails.From & CustomerEmails.To?? Guys the issue is the table CustomerEmails is used by some other application & that application just keeps the track of emails sent/received through OutLook. The table CustomerEmailIds have customer emails & these emails come into system from the application i am working on.

So the Table CustomerEmails have 7,00,000+ records while table CustomerEmailIds have 1,00,000+ records.

I need to find out the emails from table CustomerEmails, based on the emails in CustomerEmailIds table.

The query i am using is:

SELECT  
        e.*
FROM    CustomerEmails e 
WHERE   EXISTS 
        (
            SELECT  Email 
            FROM    CustomerEmailIds c 
            WHERE   ( ISNULL(e.[From],'') + '/' + ISNULL(e.[To],'') ) LIKE  '%'+c.Email+'%' 
        )

Some facts:
1- I am using SQL Server 2008
2- Sorry guys, i forgot to mention that the CustomerEmails.**To** can contains multiple comma separated emails like: [email protected],[email protected],[email protected]
3- Because of the fact2 the c.Cs3Emails+'%' OR to= c.Cs3Emails will not list the desired results that's why i am using '%'+c.Cs3Emails+'%'

Latest Findings: Guys the above query return wrong results..... & i don't know why??

But below query works fine:

SELECT  
    e.*
FROM    CustomerEmails e 
WHERE   (ISNULL(e.[From],'') + '/' + ISNULL(e.[To],'')) LIKE  '%[email protected]%' 

Upvotes: 0

Views: 735

Answers (5)

Yaqub Ahmad
Yaqub Ahmad

Reputation: 27659

Well guys first look at the query which is returning the wrong results:

SELECT  
        e.*
FROM    CustomerEmails e 
WHERE   EXISTS 
        (
            SELECT  Email 
            FROM    CustomerEmailIds c 
            WHERE   ( ISNULL(e.[From],'') + '/' + ISNULL(e.[To],'') ) LIKE  '%'+c.Email+'%' 
        )

There is no issue in this query, the issue is that some of the emails in CustomerEmailIds table are invalid emails which are ('.','@','0','-') & that's why the query is returning all the CustomerEmails where these invalid emails exists.

Thanks to Kev Riley! the persons who helped me find out the cause of this issue here!

Upvotes: 0

kheya
kheya

Reputation: 7612

Yaqub, The where clause in your query will never use an index. This is because of 2 problems:

1) you are doing string concatenation on the fly to compare

ISNULL(e.[From],'') + '/' + ISNULL(e.[To],'') ) 

2) you are using wild card at both end after like

LIKE  '%'+c.Cs3Emails+'%'

To optimize, this is what I suggest:

1) You create another column to store the concatenated value of the emails. Extra storage, but this will save you from running slow. Then you can create an index on that column to speed things up.

2) Is it possible that you change the like condition?

LIKE c.Cs3Emails+'%'

Removing the % from begining might use the index (if any on c.Cs3Emails)

Hope this helps

Upvotes: 3

Tudor Constantin
Tudor Constantin

Reputation: 26861

This part of the query:

WHERE   ( ISNULL(e.[From],'') + '/' + ISNULL(e.[To],'') ) LIKE  '%'+c.Cs3Emails+'%'

Forces a full table scan because it uses functions in the WHERE clause and because of like '%....%' - the % sign in the begining. Can you rewrite that part to use fields/constants?

You should make your queries to benefit on the indexed fields - for example, if you have an index on field c.Cs3Emails, your query should look like

WHERE c.Cs3Emails = 'some_value_here'

the some_value_here value should not be the result of functions output (if this ispossible)

Even if your query looks like

WHERE c.Cs3Emails = 'some_value_here%' 

An index can be used, but not when having

WHERE c.Cs3Emails = '%some_value_here%'

Because here, is like you would try to find in a phonebook after the second letter of the name.

Try to read more about database indexes and how to use them

Also, I think you can rewrite your query like:

SELECT  
        e.*
FROM    Emails e 
WHERE   EXISTS 
        (
            SELECT  Cs3Emails 
            FROM    Cs3EmailsForPurge c 
            WHERE   e.From = c.Cs3Emails OR e.To = c.Cs3Emails             
        )

Please try this query and see if it returns the same results as yours

If my query returns the same result set and performance does not improve, add an index on field Emails.From and an index on Emails.To. This will improve your query execution time.

Upvotes: 2

nathan gonzalez
nathan gonzalez

Reputation: 11987

the idea behind this query is flawed. i understand that you are trying to chain the messages together using a tracking number of a sort, probably in the subject line, but the way this should work is that as emails are imported into the database their tracking number should be found and then extracted into its own field. with this you pay a small hit to each import, but receive massive benefits when querying your data.

i would suggest you update your current data and rewrite the import. that query is not going to get any more performant.

Upvotes: 2

Related Questions