Reputation: 246
I need to come up with a regular expression with flavor PCRE. It must be a regular expression <
I want to grab all lines of text that end in a newline character up until I encounter <zz>
where zz is a digit enclosed in '<
' and '>
'.
e.g.
111a z
222 aset
333 //+
12 <zz> 11
abc
def
It would need to capture "111a z
", "222 aset
", "333 //+
" in this case [and nothing else].
Right now I have ^(?!.*<zz>)[^\n]+(?=\n)
but it's pretty far off from what it needs to be.
For clarification purposes, the regex I was using shows <zz>
, but definitely looking for a digit enclosed in angle brackets.
Would really appreciate some help.
Edit
This is /really/ difficult for me, because at least one of the answers looks like it does the job. I'll try to mark one... Thank you, everyone.
Upvotes: 2
Views: 2200
Reputation: 163342
You could repeat matching all lines including a Unicode newline sequence while the <\d+>
pattern does not occur in the line.
\A(?:(?!.*<\d+>).*\R)+
Explanation
\A
Start of string(?:
Non capture group
(?!.*<\d+>)
Negative lookahead, assert that the pattern <\d+>
does not occur.*\R
Match any char except a newline followed by matching a Unicode newline sequence)+
Close the non capturing group, and repeat it 1+ times to match at least a single lineIf the <\d+>
has to be present, you could assert that with a positive lookahead at the end
\A(?:(?!.*<\d+>).*\R)+(?=.*<\d+>)
Upvotes: 2
Reputation: 110675
I have assumed that the text may have more than one line that contains one or digits bracketed in '<'
and '>'
, and that those lines are not themselves to be matched.
You can use the following expression to match the lines of interest.
^(?!.*<\d+>).*\r?\n(?=[\s\S]*?<\d+>)
The regex engine performs the following operations.
^ match beginning of line
(?! begin negative lookahead (prevent matching line with '<12>'
.* match 0+ characters other than newlines
<\d+> match '<', 1+ digits, '>'
) end negative lookahead
.* match 0+ characters other than newlines
\r?\n match newline optionally preceded by '\r'
(?= begin positive lookahead
[\s\S]*? match 0+ characters (incl. newlines), non-greedily
<\d+> match '<', 1+ digits, '>'
) end positive lookahead
'\r'
, a carriage return, will be present if the file was produced when using the Windows operating system.
Upvotes: 0
Reputation: 2342
I'm not sure why you're using a negative lookahead, but I think you want a positive lookahead. This lets you only match the line if you see the <zz>
in a lookahead. I would solve the problem using something like this:
^.*(?=.*(?:\n.*)*<\d+>)\n
^
Anchors match to beginning of line (like yours).*
Matches all the characters it can. In this case it matches the whole line because it has to satisfy the \n
at the end.(?=...)
Performs a positive lookahead (makes sure the string exists somewhere ahead).*(?:\n.*)*
Allows any number of characters on any number of lines<\d+>
Only matches one or more digits enclosed in angle brackets\n
ensures that there is a newline at the end of the line.Upvotes: 0