Alon Dor
Alon Dor

Reputation: 111

Regular expression for matching a sequence?

My text includes phrases inside a known pattern, for example: #%some phrase%#, the phrase can be anything (obviously it will not include the pattern '%#').
Now, I want to build a regex (in php) that will match a sequence of 2 or more phrases (with or without white spaces between them), so if, for example, my text is:

#%jjj jjj%#  kkjjkkjj kkjjkkjj  #%kkk kkk%# #%ttt mmm%#

I want the regex to match:

#%kkk kkk%# #%ttt mmm%#

I've tried this regex: /(?:#%.+?(?!%#).%#\s*){2,}/

But for some strange reason it matches the whole string, and ignores the negative lookahead.

Furthermore, my complete task would be to match a sequence of phrases with up to 1 character between them (in addition to the white spaces).

How to implement it?

Test cases:

Text:

#%Prime target%# #%Online stuff%# English Deutsch Norsk Svenska Suomi English AU English CA #%Home%# #%About Us%# #%Fair Play%# #%Promotions%#

Should match:

  1. #%Prime target%# #%Online stuff%#
  2. #%Home%# #%About Us%# #%Fair Play%# #%Promotions%#

Text:

#%Prime target%# English Deutsch Norsk Svenska Suomi English AU English CA #%Home%# | #%About Us%# | #%Fair Play%# | #%Promotions%#

Should match:

  1. #%Home%# | #%About Us%# | #%Fair Play%# | #%Promotions%#

Upvotes: 2

Views: 145

Answers (3)

pushpraj
pushpraj

Reputation: 13679

based on your test inputs I came up with this regex, short and still effective

/((?:#%[^#]*%#(?:\s.\s|\s)){2,})/g

test string

test 1

#%Prime target%# #%Online stuff%# English Deutsch Norsk Svenska Suomi English AU English CA #%Home%# #%About Us%# #%Fair Play%# #%Promotions%#

test 2

#%Prime target%# English Deutsch Norsk Svenska Suomi English AU English CA #%Home%# | #%About Us%# | #%Fair Play%# | #%Promotions%#

result

  • MATCH 1
    1. [8-42] #%Prime target%# #%Online stuff%#
  • MATCH 2
    1. [100-151] #%Home%# #%About Us%# #%Fair Play%# #%Promotions%#
  • MATCH 3
    1. [236-293] #%Home%# | #%About Us%# | #%Fair Play%# | #%Promotions%#

try demo here

Upvotes: 1

Unihedron
Unihedron

Reputation: 11051

You have to modify your regex:

(?:#%(?:(?!%#).)+?.%#\s*)(?:.?\s*#%(?:(?!%#).)+?.%#\s*)+

Wrapping the .+? capture within the lookbehind at (?: ) group forces the lazy match to not match (?!%#) while proceeding, it's also why your original regex does not work.

Also, clone it into a separate group with match prefix .? for a character to be acceptable between groups.

Here is a regex demo!

Test case:

#%jjj jjj%# kkjjkkjj kkjjkkjj #%kkk kkk%# #%ttt mmm%#

Match:
#%kkk kkk%# #%ttt mmm%#

Upvotes: 1

Marcin Nabiałek
Marcin Nabiałek

Reputation: 111869

I think you want:

/(?:.*?#%.*?%#.*?)(#%.*%#)/g

It finds first #%...%# (ungreedy) and then match the next one #%...%# (greedy)

Demo

Upvotes: 0

Related Questions