Adam Ramadhan
Adam Ramadhan

Reputation: 22810

regex match one line not two new line

Hello I'm trying to get text from html using a regex

([a-zA-Z0-9\:\[\]\40\.\'\,\?\"\&\(\/\)\-\“\”\’\@]){600,} // let's say the example is more than 600 letters

The problem is

I want to add \n two my regex but a max two newline eg

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis dictum metus ipsum, ut hendrerit sem consectetur quis.   
\n
Nunc tincidunt mi nisl, in lobortis diam pulvinar vel. Nulla at tempus enim, sit amet viverra nisl.
\n
Nunc tincidunt mi nisl, in lobortis diam pulvinar vel. Nulla at tempus enim, sit amet viverra nisl.
\n
\n
Not this 

It will only match the first 3 line so I get something like

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis dictum metus ipsum, ut hendrerit sem consectetur quis.   
\n (ok)
Nunc tincidunt mi nisl, in lobortis diam pulvinar vel. Nulla at tempus enim, sit amet viverra nisl.
\n (ok still one)
.....
\n (ok still one)
\n (ups its more than one then stop a group)

The result will be

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis dictum metus ipsum, ut hendrerit sem consectetur quis.   

Nunc tincidunt mi nisl, in lobortis diam pulvinar vel. Nulla at tempus enim, sit amet viverra nisl.

Nunc tincidunt mi nisl, in lobortis diam pulvinar vel. Nulla at tempus enim, sit amet viverra nisl.

Upvotes: 0

Views: 192

Answers (2)

durum
durum

Reputation: 3404

A possible solution would be.

    ([a-zA-Z0-9\:\[\]\40\.\'\,\?\"\&\(\/\)\-\“\”\’\@]\n?){600,}

Two things:

  • The newlines will not be counted in the character limit (600 in your case)

  • This will not work in your example because it has less than 600 characters (it has about 330).

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336148

This is a job for a negative lookahead assertion:

[a-zA-Z0-9: \[\].',?"&(/)“”’@-]{600,}\n\n(?!\n)

matches 600 or more of your allowed characters, plus two newlines only if no additional newline can be found after that match.

Upvotes: 2

Related Questions