MitchellK
MitchellK

Reputation: 2642

Apache Exact Match Word Inside String

I am very close to solving this thanks to this post Regex find word in the string

But I am still not 100% there.

If I use this regex along with Apache's BrowserMatchNoCase

^(.*?)(\b360Spider\b)(.*)$

I get the following results:

I need it to match the word 360Spider regardless of what is put in front or after the word, so NOT360Spider should be a match.

Thanks in advance, my regex has improved somewhat over the years but I am still nowhere close to fully understanding getting things perfect without leading to false positives.

At the same time I do not want to introduce other false positives which is why I am delving into this in the first place so other user-agent names likes "Exabot" and "Alexabot" I don't want the "exabot" part of Alexabot to be detected.

So let's say in another example:

^(.*?)(\bExabot\b)(.*)$

I get the following results:

If I remove word boundaries "\b" as follows:

^(.*?)(Exabot)(.*)$

I get the following results:

So I guess I have to stick with the word boundaries "\b" now the trick is to get printf to write the "\b" into my string and not see it as a backspace character.

Upvotes: 1

Views: 438

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627022

Note that once you add word boundaries around 360Spider you can't match it inside another word, enclosed with digits or even _ symbols that are also considered word chars.

If you need to match the word anywhere inside a string, you need to remove word boundaries, \b. However, judging by your examples, you still need the word boundaries as otherwise, you will match exabot in Alexabot.

Here is a way to define your pattern in Bash:

#!/bin/bash
line='var_here'
printf "BrowserMatchNoCase \"^(.*?)(\\\b${line}\\\b)(.*)\$\" good_bot\n"

See an online demo. Note it is a good idea to escape the $ inside an interpolated string literal.

Upvotes: 1

Related Questions