Reputation: 5783
In text i want to find structures like every thing till some text, but not match between some word.
Example in text:
Templates : You can add custom templates for your theme. Updated on 2010 look[124] end
Media RSS feed : Add the Cooliris Effect to your gallery Updated on 2011 look[124]
Role settings : Each gallery has a author Updated at 2010 ... look[124] end
AJAX based thumbnail generator : No more server Updated on 2010 look[124] end limitation during the batch process Copy/Move : Copy or move images between Updated on 2010 this look[124] galleries Sortable Albums : Create your own sets of images Updated on 2010 this look[124] end
Upload or pictures via a zip-file (Not in Safe-mode)
Watermark function : You can add a watermark image or text
...
I need to find "Updated .*[124] end" every match must start this "Update" and ends with "[number]" and word "end". But some text looks very similar, but not ends with word "end". This text must not mach. How to make it work?
I try to write
/Updated(.*?)\[.*?\]\send/msi
or
Updated(.*?)\[.*?\](?!Updated)\send
But this takes strings like:
Updated on 2011 look[124] Role settings : Each gallery has a author Updated at 2010 ... look[124] end
Updated on 2010 this look[124] galleries Sortable Albums : Create your own sets of images Updated on 2010 this look[124] end
How to write regex witch skips bad matches?
Thanks for your opinion.
Upvotes: 3
Views: 119
Reputation: 2341
Maybe you can try a different approach:
/Updated[\w.\s]*\[\d+\]\send/
Explanation:
Updated
This will match the word Updated
[\w\d.\s]*
then all letters, numbers, spaces and dots (u can add any characters u wish)
\[\d+\]
then a number between brackets
\send
than a space and finally the word end
Upvotes: 0
Reputation: 75272
I think this is what you were trying for with your second regex:
Updated\s++(?>(?!Updated\b|end\b)\S+\s+)*+end\b
In other words, match Updated
and look for the corresponding end
. If you find another Updated
first, you know you started at the wrong place, so abandon that match. I excluded end
as well because that lets me match the words possessively (i.e., with *+
); the regex never has to backtrack to find or (more importantly) eliminate a match.
If you really have to specify the look[nnn]
part, this should do the trick:
Updated\s++(?>(?!Updated\b|end\b|look\[\d+\])\S+\s+)*+look\[\d+\]\s+end\b
Add the i
flag for a case-insensitive match if you need to, but you don't need the m
or s
flags. If this seems overly complicated, it's because I don't know your data as well as you do. There's a good chance this is all you really need:
Updated(?:(?!Updated).)*\send
Upvotes: 1
Reputation: 36292
One possibility:
Updated([^[]*)\[124\]\s+end
Explanation:
Updated # Word 'updated'
[^[]* # All chars until '['
\[124\] # String '[124]'
\s+ # One or more spaces.
end # String 'end'
Upvotes: 0
Reputation: 33928
To match a string that does not contain Updated
you can use constructs like:
(?:[^U]+|U(?!pdated))*
and
(?:(?!Updated).)*
Using the first alternative would give you an expression like:
Updated((?:[^U]+|U(?!pdated))*)\[\d+\]\send
First alternative explained:
(?: # non-capturing group
[^U]+ # any characters that aren't "U"
|U(?!pdated) # or a "U" which is not followed bu "pdated" (ie. not "Updated")
)* # repeated as much as possible
Second alternative:
(?: # non-capturing group
(?!Updated). # Use a lookahead check at every character to make sure it's not "Updated"
)* # repeated as much as possible
Upvotes: 1
Reputation: 49148
Assuming all the invalid matches have a [124]
, but not an end
, you can filter those out by not allowing a [
between Updated
and the end sequence, like this:
Updated([^[]*?)\[\d*\]\send
Upvotes: 1