Gabriele B
Gabriele B

Reputation: 2685

Regex matching of substrings separated by max N words

I've done extensive googling, but I was not able to find a working expression. What I mean is to match this metaexpression:

Blah Blah Blah, I'm looking for [max N words] player

In other words, I need to match:

Even these days I'm looking for a couple of players
I'm looking for an experienced player
I'm looking here and there to find a good player  <--- Must not match!
I'm looking for a player

and so on...

As you may see I'm not counting characters, but words.

N will probably be 5 in my case.

I don't need to return, just need to check if this n-gram pattern is found in the strings.

EDIT: Edited the third line (the one without the 'for') for clarification

Upvotes: 0

Views: 84

Answers (4)

terdon
terdon

Reputation: 3380

The details will depend on which regex flavor you are using. For those that support it, you can search for 1-N instances of an expression, use the {1,N} format. For example, using this test file:

Even these days I'm looking for a couple of players
I'm looking for an experienced player
I'm looking here and there to find a good player
I'm looking for a player
I'm looking for too many words here, it should not match player

Using GNU grep to illustrate, with a maximum number of 3:

  1. Basic Regular Expressions (BRE)
    $ grep -o "I'm looking for \([^[:blank:]]* \)\{1,3\}players*" file 
    I'm looking for a couple of players
    I'm looking for an experienced player
    I'm looking for a player
  1. Extended Regular Expressions (ERE):
 $ grep -oE "I'm looking for ([^\s]* ){1,3}players*" file 
 I'm looking for a couple of players
 I'm looking for an experienced player
 I'm looking for a player

Upvotes: 1

Toto
Toto

Reputation: 91508

I'd do:

(?<=looking for)(?:\s+\S+){1,5}\s+(?=\player)

Where (?:\s+\S+){1,5} matches one or more spaces followed by one or more non space, repeated one to five times.

Upvotes: 0

Federico Piazza
Federico Piazza

Reputation: 31045

If you want to capture the content within that you could use a regex like this:

(?<=looking for)(.*)(?=player)

Working demo

enter image description here

The match content will be:

MATCH 1
1.  [31-44] ` a couple of `
MATCH 2
1.  [67-83] ` an experienced `
MATCH 3
1.  [154-157]   ` a `

Btw, if you don't want to use lookarounds you could simply use:

looking for(.*)player

On the other hand, since example 3 contains here above regex won't match, so if you want to include that too, you could use this regex:

looking (?:for|here)(.*)player

Upvotes: 0

CodingDuckling
CodingDuckling

Reputation: 583

Is this something along the lines you are looking for?

.*(I'm looking for) (.*) (player{1}s?)

http://regex101.com/r/zT0qR4/1

I saw on some lines you had player and in other players, and also as Avinash says, do you want to match line #3 as well?

You can capture the words you are looking for within capture groups $2. Or you can just add ?= to the other groups to make them non-capture.

Upvotes: 0

Related Questions