Reputation: 3866
I'm being destroyed by spam and the emails are always different except that they always have similar links like this that repeat several times:
http://spam.com/hello/world/fk59j356jss5ptttNMdlJ96vmrDsjEeCPDXJf0fBXOi
So I'm trying to put a filter on my server that will scan a slash followed by 30-50 alphanumeric characters, which will then repeat at least 3 times. I wrote the following regular expression but on regex101.com,I keep getting a "timeout" message, probably because there is a better way to write it:
/(\/\w{30,50})(.+?\1){3,}/s
I tried google but my search terms never returned what I wanted.
EDIT
Here's the link so you can see: https://regex101.com/r/tL9wK7/2 We can identify the spam link with this part that always repeats:
/bcaip86eJR2W5hKmMjFiKVWmKyLjmiMKhkOm0Mjh906
There is always something similar in the spam emails (a slash followed by a series of alphanumeric characters). This link is different in every spam email but it will repeat multiple times in the same email.
So scanning in an email if there is a link with a slash followed by 30-50 alphanumeric characters that appears several times in the same email will reveal that it is spam.
Upvotes: 2
Views: 24251
Reputation: 64
I believe I've improved on your pattern slightly:
/(\/\w{30,})(?:.+?\1){3,}?/s
Demo link: https://regex101.com/r/aNdURv/1
Key changes:
1. Why stop at 50 characters? Shouldn't matter how long the word is as long as it is at least 30. So I removed "50" from the first group.
2. You don't need to capture each repeat, just to count it towards the total you are aiming for (3 or more), so I added "?:" to the second group.
3. You don't need it to find all matching repeats, meaning it can be lazy and stop as long as it finds at least 3. So I added "?" to the end.
Upvotes: 5
Reputation: 2553
How about this one?
/\/(\w{30,50})(?:.*\1)(?:.*\1)/sg
This would solve your question, given example data that filled the criteria. You can see it working by removing the last capture group when used with your regex101 link.
Upvotes: 0
Reputation: 5473
You can try this (slight modification of your regex)-
(\/\w{30,50})(.*?\1){3,}
Upvotes: 0