NaturalBornCamper
NaturalBornCamper

Reputation: 3866

Regex to find a pattern repeating at least n times

I'm being destroyed by spam and the emails are always different except that they always have similar links like this that repeat several times:

http://spam.com/hello/world/fk59j356jss5ptttNMdlJ96vmrDsjEeCPDXJf0fBXOi

So I'm trying to put a filter on my server that will scan a slash followed by 30-50 alphanumeric characters, which will then repeat at least 3 times. I wrote the following regular expression but on regex101.com,I keep getting a "timeout" message, probably because there is a better way to write it:

/(\/\w{30,50})(.+?\1){3,}/s

I tried google but my search terms never returned what I wanted.

EDIT

Here's the link so you can see: https://regex101.com/r/tL9wK7/2 We can identify the spam link with this part that always repeats:

/bcaip86eJR2W5hKmMjFiKVWmKyLjmiMKhkOm0Mjh906

There is always something similar in the spam emails (a slash followed by a series of alphanumeric characters). This link is different in every spam email but it will repeat multiple times in the same email.

So scanning in an email if there is a link with a slash followed by 30-50 alphanumeric characters that appears several times in the same email will reveal that it is spam.

Upvotes: 2

Views: 24251

Answers (3)

Jeff
Jeff

Reputation: 64

I believe I've improved on your pattern slightly:

/(\/\w{30,})(?:.+?\1){3,}?/s

Demo link: https://regex101.com/r/aNdURv/1

Key changes:
1. Why stop at 50 characters? Shouldn't matter how long the word is as long as it is at least 30. So I removed "50" from the first group.
2. You don't need to capture each repeat, just to count it towards the total you are aiming for (3 or more), so I added "?:" to the second group.
3. You don't need it to find all matching repeats, meaning it can be lazy and stop as long as it finds at least 3. So I added "?" to the end.

Upvotes: 5

melwil
melwil

Reputation: 2553

How about this one?

/\/(\w{30,50})(?:.*\1)(?:.*\1)/sg

This would solve your question, given example data that filled the criteria. You can see it working by removing the last capture group when used with your regex101 link.

Upvotes: 0

Kamehameha
Kamehameha

Reputation: 5473

You can try this (slight modification of your regex)-

(\/\w{30,50})(.*?\1){3,}

Demo here

Upvotes: 0

Related Questions