SuGo
SuGo

Reputation: 139

Finding a Specific Character and/or String SQL Server 2008

I have two columns which contain email info. Column A, Column B. Now, in these fields, emails SHOULD be as such:

 Column A                 Column B
 [email protected]; [email protected]         [email protected]; [email protected]
 [email protected];[email protected]          [email protected];[email protected]

However, in an effort to do some data quality checks and such, it turns out that MANY entries are instead not following this format. I am trying to find all the outliers, and I have identified the outliers to take the form as such:

[email protected] and [email protected]
[email protected], [email protected] (uses comma so it is incorrect)
[email protected] or [email protected]
[email protected] / [email protected]

There could be other wrong characters or words that make the format incorrect. But I hope these examples pinpoint the issue.

What I am trying to do: Create a query that will pinpoint all instances that are NOT in the correct format, so that the problem points can be found and edited later, but that's a different topic :)

Here is a Query I have so far:

SELECT     A_EMAIL, B_EMAIL, NAME, ID
FROM         NAMES

WHERE A_EMAIL LIKE ('and %') OR A_EMAIL LIKE ('or %') 
OR B_EMAIL LIKE ('and %') OR B LIKE ('or %')

This is using LIKE and the % is with a space in between. However, this returns no results, and I know such results definitely do exist. But I would like to build a logic that would bring me back everything that isn't in the proper format instead of trying to use LIKE 'XYZ' because even though I know most of the problem issues, I could still miss some.

However, if such a thing isn't possible via SQL. Then I would still like to get my current logic of using LIKE ('XYZ %') to work instead which even though not an optimal route, should still be able to help me in my goal someway.

Upvotes: 2

Views: 5267

Answers (4)

Arpan Mohokar
Arpan Mohokar

Reputation: 184

I believe a simple Not like query will suffice your requirement here as shown below

Select * From EmailTable Where Email NOT LIKE '%;%'

Upvotes: 0

Vladimir Baranov
Vladimir Baranov

Reputation: 32695

Your query is fine, you just missed one % in it. Instead of this

WHERE A_EMAIL LIKE ('and %') OR A_EMAIL LIKE ('or %') 
OR B_EMAIL LIKE ('and %') OR B LIKE ('or %')

you should use this

WHERE A_EMAIL LIKE ('%and %') OR A_EMAIL LIKE ('%or %') 
OR B_EMAIL LIKE ('%and %') OR B LIKE ('%or %')

Your original query looks for values that start with 'and ', while you are interested in cases where 'and ' appears anywhere inside the column value.

Of course, this is a one-off solution to your immediate problem. The permanent solution is not to store several e-mails in the same column in the first place.

Upvotes: 1

asantaballa
asantaballa

Reputation: 4048

Try something like this

Create Table #Emails (Email varchar(128))
Go

Insert into #Emails
Values 
  ('[email protected]')
, ('[email protected]')
, (Null)
, ('[email protected]')
, ('[email protected] and [email protected]')
, ('[email protected] and [email protected]')
, ('[email protected], [email protected]')
, ('[email protected] or [email protected]')
, ('[email protected] / [email protected]')
Go

Select 
  Email
, Case 
    When Email Is Not Null 
     And LTrim(RTrim(Email)) Like '%_@__%.__%' 
     And LTrim(RTrim(Email)) Not Like '% %' 
        Then 'Good' 
        Else 'Bad' 
  End Quality
From #Emails

Original [BAD]

Not perfect, but select with a like statement as below. Good chance will get you majority

NOT LIKE '%_@__%.__%

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269763

I would suggest that you remove spaces and then look for non-email like characters:

where replace(cola, '; ', ';') like '%[^a-zA-Z0-9@ ;.]%'

That is, the column contains an invalid character.

Then, I would encourage you to create a junction table so you are not storing lists of emails in semi-colon delimited columns.

Upvotes: 3

Related Questions