Reputation: 2642
I am very close to solving this thanks to this post Regex find word in the string
But I am still not 100% there.
If I use this regex along with Apache's BrowserMatchNoCase
^(.*?)(\b360Spider\b)(.*)$
I get the following results:
I need it to match the word 360Spider regardless of what is put in front or after the word, so NOT360Spider should be a match.
Thanks in advance, my regex has improved somewhat over the years but I am still nowhere close to fully understanding getting things perfect without leading to false positives.
At the same time I do not want to introduce other false positives which is why I am delving into this in the first place so other user-agent names likes "Exabot" and "Alexabot" I don't want the "exabot" part of Alexabot to be detected.
So let's say in another example:
^(.*?)(\bExabot\b)(.*)$
I get the following results:
If I remove word boundaries "\b" as follows:
^(.*?)(Exabot)(.*)$
I get the following results:
So I guess I have to stick with the word boundaries "\b" now the trick is to get printf to write the "\b" into my string and not see it as a backspace character.
Upvotes: 1
Views: 438
Reputation: 627022
Note that once you add word boundaries around 360Spider
you can't match it inside another word, enclosed with digits or even _
symbols that are also considered word chars.
If you need to match the word anywhere inside a string, you need to remove word boundaries, \b
. However, judging by your examples, you still need the word boundaries as otherwise, you will match exabot
in Alexabot
.
Here is a way to define your pattern in Bash:
#!/bin/bash
line='var_here'
printf "BrowserMatchNoCase \"^(.*?)(\\\b${line}\\\b)(.*)\$\" good_bot\n"
See an online demo. Note it is a good idea to escape the $
inside an interpolated string literal.
Upvotes: 1