Reputation: 111
My text includes phrases inside a known pattern, for example: #%some phrase%#
, the phrase can be anything (obviously it will not include the pattern '%#'
).
Now, I want to build a regex (in php) that will match a sequence of 2 or more phrases (with or without white spaces between them), so if, for example, my text is:
#%jjj jjj%# kkjjkkjj kkjjkkjj #%kkk kkk%# #%ttt mmm%#
I want the regex to match:
#%kkk kkk%# #%ttt mmm%#
I've tried this regex: /(?:#%.+?(?!%#).%#\s*){2,}/
But for some strange reason it matches the whole string, and ignores the negative lookahead.
Furthermore, my complete task would be to match a sequence of phrases with up to 1 character between them (in addition to the white spaces).
How to implement it?
Test cases:
Text:
#%Prime target%# #%Online stuff%# English Deutsch Norsk Svenska Suomi English AU English CA #%Home%# #%About Us%# #%Fair Play%# #%Promotions%#
Should match:
- #%Prime target%# #%Online stuff%#
- #%Home%# #%About Us%# #%Fair Play%# #%Promotions%#
Text:
#%Prime target%# English Deutsch Norsk Svenska Suomi English AU English CA #%Home%# | #%About Us%# | #%Fair Play%# | #%Promotions%#
Should match:
- #%Home%# | #%About Us%# | #%Fair Play%# | #%Promotions%#
Upvotes: 2
Views: 145
Reputation: 13679
based on your test inputs I came up with this regex, short and still effective
/((?:#%[^#]*%#(?:\s.\s|\s)){2,})/g
test string
test 1
#%Prime target%# #%Online stuff%# English Deutsch Norsk Svenska Suomi English AU English CA #%Home%# #%About Us%# #%Fair Play%# #%Promotions%#
test 2
#%Prime target%# English Deutsch Norsk Svenska Suomi English AU English CA #%Home%# | #%About Us%# | #%Fair Play%# | #%Promotions%#
result
#%Prime target%# #%Online stuff%#
#%Home%# #%About Us%# #%Fair Play%# #%Promotions%#
#%Home%# | #%About Us%# | #%Fair Play%# | #%Promotions%#
try demo here
Upvotes: 1
Reputation: 11051
You have to modify your regex:
(?:#%(?:(?!%#).)+?.%#\s*)(?:.?\s*#%(?:(?!%#).)+?.%#\s*)+
Wrapping the .+?
capture within the lookbehind at (?: )
group forces the lazy match to not match (?!%#)
while proceeding, it's also why your original regex does not work.
Also, clone it into a separate group with match prefix .?
for a character to be acceptable between groups.
Here is a regex demo!
Test case:
#%jjj jjj%# kkjjkkjj kkjjkkjj #%kkk kkk%# #%ttt mmm%#
Match:
#%kkk kkk%# #%ttt mmm%#
Upvotes: 1
Reputation: 111869
I think you want:
/(?:.*?#%.*?%#.*?)(#%.*%#)/g
It finds first #%...%#
(ungreedy) and then match the next one #%...%#
(greedy)
Upvotes: 0