aaaa
aaaa

Reputation: 246

Regex that satisfies multiple criteria

I need to come up with a regular expression with flavor PCRE. It must be a regular expression <

I want to grab all lines of text that end in a newline character up until I encounter <zz> where zz is a digit enclosed in '<' and '>'.

e.g.

111a z
222 aset
333 //+
12 <zz> 11
abc
def

It would need to capture "111a z", "222 aset", "333 //+" in this case [and nothing else]. Right now I have ^(?!.*<zz>)[^\n]+(?=\n) but it's pretty far off from what it needs to be.

For clarification purposes, the regex I was using shows <zz>, but definitely looking for a digit enclosed in angle brackets.

Would really appreciate some help.

Edit

This is /really/ difficult for me, because at least one of the answers looks like it does the job. I'll try to mark one... Thank you, everyone.

Upvotes: 2

Views: 2200

Answers (3)

The fourth bird
The fourth bird

Reputation: 163342

You could repeat matching all lines including a Unicode newline sequence while the <\d+> pattern does not occur in the line.

\A(?:(?!.*<\d+>).*\R)+

Explanation

  • \A Start of string
  • (?: Non capture group
    • (?!.*<\d+>) Negative lookahead, assert that the pattern <\d+> does not occur
    • .*\R Match any char except a newline followed by matching a Unicode newline sequence
  • )+ Close the non capturing group, and repeat it 1+ times to match at least a single line

Regex demo


If the <\d+> has to be present, you could assert that with a positive lookahead at the end

\A(?:(?!.*<\d+>).*\R)+(?=.*<\d+>)

Upvotes: 2

Cary Swoveland
Cary Swoveland

Reputation: 110675

I have assumed that the text may have more than one line that contains one or digits bracketed in '<' and '>', and that those lines are not themselves to be matched.

You can use the following expression to match the lines of interest.

^(?!.*<\d+>).*\r?\n(?=[\s\S]*?<\d+>)

PCRE Demo

The regex engine performs the following operations.

^           match beginning of line
(?!         begin negative lookahead (prevent matching line with '<12>'
  .*        match 0+ characters other than newlines
   <\d+>    match '<', 1+ digits, '>'
)           end negative lookahead
.*          match 0+ characters other than newlines
\r?\n       match newline optionally preceded by '\r'
(?=         begin positive lookahead
  [\s\S]*?  match 0+ characters (incl. newlines), non-greedily
  <\d+>     match '<', 1+ digits, '>' 
)           end positive lookahead

'\r', a carriage return, will be present if the file was produced when using the Windows operating system.

Upvotes: 0

Charlie Armstrong
Charlie Armstrong

Reputation: 2342

I'm not sure why you're using a negative lookahead, but I think you want a positive lookahead. This lets you only match the line if you see the <zz> in a lookahead. I would solve the problem using something like this:

^.*(?=.*(?:\n.*)*<\d+>)\n
  • ^ Anchors match to beginning of line (like yours)
  • .* Matches all the characters it can. In this case it matches the whole line because it has to satisfy the \n at the end.
  • (?=...) Performs a positive lookahead (makes sure the string exists somewhere ahead)
  • .*(?:\n.*)* Allows any number of characters on any number of lines
  • <\d+> Only matches one or more digits enclosed in angle brackets
  • \n ensures that there is a newline at the end of the line.

Upvotes: 0

Related Questions