Ben Currer
Ben Currer

Reputation: 305

Regex how to get a full match of nth word (without using non-capturing groups)

I am trying to use Regex to return the nth word in a string. This would be simple enough using other answers to similar questions; however, I do not have access to any of the code. I can only access a regex input field and the server only returns the 'full match' and cannot be made to return any captured groups such as 'group 1'

EDIT: From the developers explaining the version of regex used:

"...its javascript regex so should mostly be compatible with perl i believe but not as advanced, its fairly low level so wasn't really intended for use by end users when originally implemented - i added the dropdown with the intention of having some presets going forwards."

/EDIT

Sample String:

One Two Three Four Five

Attempted solution (which is meant to get just the 2nd word):

^(?:\w+ ){1}(\S+)$

The result is:

One Two

I have also tried other variations of the regex:

(?:\w+ ){1}(\S+)$
^(?:\w+ ){1}(\S+)

But these just return the entire string.

I have tried replicating the behaviour that I see using regex101 but the results seem to be different, particularly when changing around the ^ and $.

For example, I get the same output on regex101 if I use the altered regex:

^(?:\w+ ){1}(\S+)

In any case, none of the comparing has helped me actually achieve my stated aim.

I am hoping that I have just missed something basic!

===EDIT===

Thanks to all of you who have contributed thus far, however, I am still running into issues. I am afraid that I do not know the language or restrictions on the regex other than what I can ascertain through trial and error, therefore here is a list of attempts and results all of which are trying to return "Two" from a sample of:

One Two Three Four Five

\w+(?=( \w+){1}$)

returns all words

^(\w+ ){1}\K(\w+)

returns no words atall (so I assume that \K does not work)

(\w+? ){1}\K(\w+?)(?= )

returns no words at all

\w+(?=\s\w+\s\w+\s\w+$)

returns all words

^(?:\w+\s){1}\K\w+

returns all words

====

With all of the above not working, I thought I would test out some others to see the limitations of the system

Attempting to return the last word:

 \w+$

returns all words

This leads me to believe that something strange is going on with the start ^ and end $ characters, perhaps the server puts these in automatically if they are omitted? Any more ideas greatly appreciated.

Upvotes: 0

Views: 2579

Answers (4)

Tezra
Tezra

Reputation: 8833

So, on the down side, you can't use look behind because that has to be a fixed width pattern, but the "full match" is just the last thing that "full matches", so you just need something whose last match is your word.

With Positive look-ahead, you can get the nth word from the right

\w+(?=( \w+){n}$)

If your server has extended regex, \K can "clear matched items", but most regex engines don't support this.

^(\w+ ){n}\K(\w+)

Unfortunately, Regex doesn't have a standard "match only n'th occurrence", So counting from the right is the best you can do. (Also, Regex101 has a searchable quick reference in the bottom right corner for looking up special characters, just remember that most of those characters are not supported by all regex engines)

Upvotes: 0

degant
degant

Reputation: 4981

I'm not sure if your language does support \K, but still sharing this anyway in case it does support:

^(?:\w+\s){3}\K\w+

to get the 4th word.

  • ^ represents starting anchor
  • (?:\w+\s){3} is a non-capturing group that matches three words (ending with spaces)
  • \K is a match reset, so it resets the match and the previously matched characters aren't included
  • \w+ helps consume the nth word

Regex101 Demo

And similarly,

  • ^(?:\w+\s){1}\K\w+ for the 2nd word
  • ^(?:\w+\s){2}\K\w+ for the 3rd word
  • ^(?:\w+\s){3}\K\w+ for the 4th word
  • and so on...

Upvotes: 0

matthewjselby
matthewjselby

Reputation: 112

It's possible to use reset match (\K) to reset the position of the match and obtain the third word of a string as follows:

(\w+? ){2}\K(\w+?)(?= )

I'm not sure what language you're working in, so you may or may not have access to this feature.

Upvotes: 0

Stephane Janicaud
Stephane Janicaud

Reputation: 3627

I don't known if your language supports positive lookbehind, so using your example,

One Two Three Four Five

here is a solution which should work in every language :

\w+ match the first word

\w+$ match the last word

\w+(?=\s\w+$) match the 4th word

\w+(?=\s\w+\s\w+$) match the 3rd word

\w+(?=\s\w+\s\w+\s\w+$) match the 2nd word

So if a string contains 10 words :

The first and the last word are easy to find. To find a word at a position, then you simply have to use this rule :

\w+(?= followed by \s\w+ (10 - position) times followed by $)

Example

In this string :

One Two Three Four Five Six Seven Height Nine Ten

I want to find the 6th word.

10 - 6 = 4

\w+(?= followed by \s\w+ 4 times followed by $)

Our final regex is

\w+(?=\s\w+\s\w+\s\w+\s\w+$)

Demo

Upvotes: 1

Related Questions