Jeff Widman
Jeff Widman

Reputation: 23482

How to find groups of strings in MySQL that share the same (unkown) prefixes?

I'm hunting spammers in my forum, and noticed that frequently spammers will use email addresses from Yahoo or Gmail in the following pattern:

abcde*******[email protected]
abcde*******[email protected]
abcde*******[email protected]

They'll take a string of characters that are the same, followed with random gibberish. Often, but not always they'll end it with random numerals.

To find these, I'd like to pull out all user email addresses that share the same first five characters with at least two other accounts. I'm manually reviewing these before deleting, so not too worried about an infrequent false positive.

The five characters will be random, I don't know them ahead of time, just that they're shared across multiple account emails.

How can I do this in MySQL?

I'm guessing I just need to slice the first five characters off the address, then GROUP BY this slice having count > 2.

But I have no idea what the correct MySQL syntax for this is. I suspect it can be written without a regex, although if necessary I'm using MariaDB 10 which offers several handy regex-based SQL functions.

Upvotes: 1

Views: 208

Answers (2)

Amadan
Amadan

Reputation: 198314

SELECT SUBSTR(email, 1, 5) AS prefix FROM log
GROUP BY prefix
HAVING COUNT(DISTINCT email) > 2

I have a feeling it will be slow though. :)

EDIT: This will work in a table where multiple occurences of the same email address is a possibility, like an access log. If all emails are distinct, matthewh's solution is also okay.

Upvotes: 2

Matt Healy
Matt Healy

Reputation: 18531

Try the MySQL SUBSTRING function:

select SUBSTRING(email,1,5),count(*) from users group by 1 having count(*)>2 

Upvotes: 2

Related Questions