Vj_x49x
Vj_x49x

Reputation: 47

Match a specific regex using matches()

Trying to match a specific word using matches()

*//id[matches(.,lower-case('*\s?Xander\s?*'))]

Examples:

Set of Xanderous- No match
Xander Tray of 6- Match
Tray of 6 pieces Xander- Match
Set of 6 Xander pieces- Match

Any instance of the exact word 'Xander' match is the objective.

Upvotes: 0

Views: 114

Answers (3)

Michael Kay
Michael Kay

Reputation: 163262

The reason the XPath regex dialect doesn't handle word boundaries is that to do it properly, you need to be language-sensitive - a "word" is a cultural artefact.

You could do tokenize(., '\P{L}+') = 'Xander' which tokenizes treating any sequence of non-letters as a separator and then tests if one of the tokens is 'Xander'.

Upvotes: 2

JvdV
JvdV

Reputation: 75840

I have been running some tests and it seems word boundaries are not integrated into the XML/XPATH vocabulary. So the next best thing IMO is to test for a whitespace or start/end string anchors surrounding zero or more characters. Therefore, I ended up with:

*//id[matches(lower-case(.),'.*(^|\s)xander($|\s).*')]

Even better would be to drop lower-case alltogether and use the third matches parameter (flags) setting it to case-insensitive matching:

*//id[matches(.,'.*(^|\s)xander($|\s).*','i')] 

Upvotes: 1

Sumak
Sumak

Reputation: 1051

Roughly, if you want to get the full line matching if it exactly contains the word Xander, you can use \b which delimits a specific word, plus some greedy operators .*:

^.*\bXander\b.*$

Demo: https://regex101.com/r/PvKptN/1

Or if you don't need the whole line, you can simply check if it contains Xander:

\bXander\b

Demo: https://regex101.com/r/PvKptN/2

I hope it satisfies the regex flavor you're using

Upvotes: -1

Related Questions