Reputation: 139
I have two columns which contain email info. Column A, Column B. Now, in these fields, emails SHOULD be as such:
Column A Column B
[email protected]; [email protected] [email protected]; [email protected]
[email protected];[email protected] [email protected];[email protected]
However, in an effort to do some data quality checks and such, it turns out that MANY entries are instead not following this format. I am trying to find all the outliers, and I have identified the outliers to take the form as such:
[email protected] and [email protected]
[email protected], [email protected] (uses comma so it is incorrect)
[email protected] or [email protected]
[email protected] / [email protected]
There could be other wrong characters or words that make the format incorrect. But I hope these examples pinpoint the issue.
What I am trying to do: Create a query that will pinpoint all instances that are NOT in the correct format, so that the problem points can be found and edited later, but that's a different topic :)
Here is a Query I have so far:
SELECT A_EMAIL, B_EMAIL, NAME, ID
FROM NAMES
WHERE A_EMAIL LIKE ('and %') OR A_EMAIL LIKE ('or %')
OR B_EMAIL LIKE ('and %') OR B LIKE ('or %')
This is using LIKE and the % is with a space in between. However, this returns no results, and I know such results definitely do exist. But I would like to build a logic that would bring me back everything that isn't in the proper format instead of trying to use LIKE 'XYZ' because even though I know most of the problem issues, I could still miss some.
However, if such a thing isn't possible via SQL. Then I would still like to get my current logic of using LIKE ('XYZ %') to work instead which even though not an optimal route, should still be able to help me in my goal someway.
Upvotes: 2
Views: 5267
Reputation: 184
I believe a simple Not like query will suffice your requirement here as shown below
Select * From EmailTable Where Email NOT LIKE '%;%'
Upvotes: 0
Reputation: 32695
Your query is fine, you just missed one %
in it.
Instead of this
WHERE A_EMAIL LIKE ('and %') OR A_EMAIL LIKE ('or %')
OR B_EMAIL LIKE ('and %') OR B LIKE ('or %')
you should use this
WHERE A_EMAIL LIKE ('%and %') OR A_EMAIL LIKE ('%or %')
OR B_EMAIL LIKE ('%and %') OR B LIKE ('%or %')
Your original query looks for values that start with 'and ', while you are interested in cases where 'and ' appears anywhere inside the column value.
Of course, this is a one-off solution to your immediate problem. The permanent solution is not to store several e-mails in the same column in the first place.
Upvotes: 1
Reputation: 4048
Try something like this
Create Table #Emails (Email varchar(128))
Go
Insert into #Emails
Values
('[email protected]')
, ('[email protected]')
, (Null)
, ('[email protected]')
, ('[email protected] and [email protected]')
, ('[email protected] and [email protected]')
, ('[email protected], [email protected]')
, ('[email protected] or [email protected]')
, ('[email protected] / [email protected]')
Go
Select
Email
, Case
When Email Is Not Null
And LTrim(RTrim(Email)) Like '%_@__%.__%'
And LTrim(RTrim(Email)) Not Like '% %'
Then 'Good'
Else 'Bad'
End Quality
From #Emails
Original [BAD]
Not perfect, but select with a like statement as below. Good chance will get you majority
NOT LIKE '%_@__%.__%
Upvotes: 0
Reputation: 1269763
I would suggest that you remove spaces and then look for non-email like characters:
where replace(cola, '; ', ';') like '%[^a-zA-Z0-9@ ;.]%'
That is, the column contains an invalid character.
Then, I would encourage you to create a junction table so you are not storing lists of emails in semi-colon delimited columns.
Upvotes: 3