Reputation: 305
I am trying to use Regex to return the nth word in a string. This would be simple enough using other answers to similar questions; however, I do not have access to any of the code. I can only access a regex input field and the server only returns the 'full match' and cannot be made to return any captured groups such as 'group 1'
EDIT: From the developers explaining the version of regex used:
"...its javascript regex so should mostly be compatible with perl i believe but not as advanced, its fairly low level so wasn't really intended for use by end users when originally implemented - i added the dropdown with the intention of having some presets going forwards."
/EDIT
Sample String:
One Two Three Four Five
Attempted solution (which is meant to get just the 2nd word):
^(?:\w+ ){1}(\S+)$
The result is:
One Two
I have also tried other variations of the regex:
(?:\w+ ){1}(\S+)$
^(?:\w+ ){1}(\S+)
But these just return the entire string.
I have tried replicating the behaviour that I see using regex101 but the results seem to be different, particularly when changing around the ^
and $
.
For example, I get the same output on regex101 if I use the altered regex:
^(?:\w+ ){1}(\S+)
In any case, none of the comparing has helped me actually achieve my stated aim.
I am hoping that I have just missed something basic!
===EDIT===
Thanks to all of you who have contributed thus far, however, I am still running into issues. I am afraid that I do not know the language or restrictions on the regex other than what I can ascertain through trial and error, therefore here is a list of attempts and results all of which are trying to return "Two" from a sample of:
One Two Three Four Five
\w+(?=( \w+){1}$)
returns all words
^(\w+ ){1}\K(\w+)
returns no words atall (so I assume that \K
does not work)
(\w+? ){1}\K(\w+?)(?= )
returns no words at all
\w+(?=\s\w+\s\w+\s\w+$)
returns all words
^(?:\w+\s){1}\K\w+
returns all words
====
With all of the above not working, I thought I would test out some others to see the limitations of the system
Attempting to return the last word:
\w+$
returns all words
This leads me to believe that something strange is going on with the start ^
and end $
characters, perhaps the server puts these in automatically if they are omitted? Any more ideas greatly appreciated.
Upvotes: 0
Views: 2579
Reputation: 8833
So, on the down side, you can't use look behind because that has to be a fixed width pattern, but the "full match" is just the last thing that "full matches", so you just need something whose last match is your word.
With Positive look-ahead, you can get the nth word from the right
\w+(?=( \w+){n}$)
If your server has extended regex, \K can "clear matched items", but most regex engines don't support this.
^(\w+ ){n}\K(\w+)
Unfortunately, Regex doesn't have a standard "match only n'th occurrence", So counting from the right is the best you can do. (Also, Regex101 has a searchable quick reference in the bottom right corner for looking up special characters, just remember that most of those characters are not supported by all regex engines)
Upvotes: 0
Reputation: 4981
I'm not sure if your language does support \K
, but still sharing this anyway in case it does support:
^(?:\w+\s){3}\K\w+
to get the 4th word.
^
represents starting anchor(?:\w+\s){3}
is a non-capturing group that matches three words (ending with spaces)\K
is a match reset, so it resets the match and the previously matched characters aren't included\w+
helps consume the nth wordAnd similarly,
^(?:\w+\s){1}\K\w+
for the 2nd word^(?:\w+\s){2}\K\w+
for the 3rd word^(?:\w+\s){3}\K\w+
for the 4th wordUpvotes: 0
Reputation: 112
It's possible to use reset match (\K) to reset the position of the match and obtain the third word of a string as follows:
(\w+? ){2}\K(\w+?)(?= )
I'm not sure what language you're working in, so you may or may not have access to this feature.
Upvotes: 0
Reputation: 3627
I don't known if your language supports positive lookbehind, so using your example,
One Two Three Four Five
here is a solution which should work in every language :
\w+
match the first word
\w+$
match the last word
\w+(?=\s\w+$)
match the 4th word
\w+(?=\s\w+\s\w+$)
match the 3rd word
\w+(?=\s\w+\s\w+\s\w+$)
match the 2nd word
So if a string contains 10 words :
The first and the last word are easy to find. To find a word at a position, then you simply have to use this rule :
\w+(?=
followed by \s\w+
(10 - position) times followed by $)
Example
In this string :
One Two Three Four Five Six Seven Height Nine Ten
I want to find the 6th word.
10 - 6 = 4
\w+(?=
followed by \s\w+
4 times followed by $)
Our final regex is
\w+(?=\s\w+\s\w+\s\w+\s\w+$)
Upvotes: 1