Reputation: 23482
I'm hunting spammers in my forum, and noticed that frequently spammers will use email addresses from Yahoo or Gmail in the following pattern:
abcde*******[email protected]
abcde*******[email protected]
abcde*******[email protected]
They'll take a string of characters that are the same, followed with random gibberish. Often, but not always they'll end it with random numerals.
To find these, I'd like to pull out all user email addresses that share the same first five characters with at least two other accounts. I'm manually reviewing these before deleting, so not too worried about an infrequent false positive.
The five characters will be random, I don't know them ahead of time, just that they're shared across multiple account emails.
How can I do this in MySQL?
I'm guessing I just need to slice the first five characters off the address, then GROUP BY
this slice having count > 2.
But I have no idea what the correct MySQL syntax for this is. I suspect it can be written without a regex, although if necessary I'm using MariaDB 10 which offers several handy regex-based SQL functions.
Upvotes: 1
Views: 208
Reputation: 198314
SELECT SUBSTR(email, 1, 5) AS prefix FROM log
GROUP BY prefix
HAVING COUNT(DISTINCT email) > 2
I have a feeling it will be slow though. :)
EDIT: This will work in a table where multiple occurences of the same email address is a possibility, like an access log. If all emails are distinct, matthewh's solution is also okay.
Upvotes: 2
Reputation: 18531
Try the MySQL SUBSTRING function:
select SUBSTRING(email,1,5),count(*) from users group by 1 having count(*)>2
Upvotes: 2