Reputation: 22810
Hello I'm trying to get text from html using a regex
([a-zA-Z0-9\:\[\]\40\.\'\,\?\"\&\(\/\)\-\“\”\’\@]){600,} // let's say the example is more than 600 letters
The problem is
I want to add \n
two my regex but a max two newline eg
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis dictum metus ipsum, ut hendrerit sem consectetur quis.
\n
Nunc tincidunt mi nisl, in lobortis diam pulvinar vel. Nulla at tempus enim, sit amet viverra nisl.
\n
Nunc tincidunt mi nisl, in lobortis diam pulvinar vel. Nulla at tempus enim, sit amet viverra nisl.
\n
\n
Not this
It will only match the first 3 line so I get something like
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis dictum metus ipsum, ut hendrerit sem consectetur quis.
\n (ok)
Nunc tincidunt mi nisl, in lobortis diam pulvinar vel. Nulla at tempus enim, sit amet viverra nisl.
\n (ok still one)
.....
\n (ok still one)
\n (ups its more than one then stop a group)
The result will be
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis dictum metus ipsum, ut hendrerit sem consectetur quis.
Nunc tincidunt mi nisl, in lobortis diam pulvinar vel. Nulla at tempus enim, sit amet viverra nisl.
Nunc tincidunt mi nisl, in lobortis diam pulvinar vel. Nulla at tempus enim, sit amet viverra nisl.
Upvotes: 0
Views: 192
Reputation: 3404
A possible solution would be.
([a-zA-Z0-9\:\[\]\40\.\'\,\?\"\&\(\/\)\-\“\”\’\@]\n?){600,}
Two things:
The newlines will not be counted in the character limit (600 in your case)
This will not work in your example because it has less than 600 characters (it has about 330).
Upvotes: 0
Reputation: 336148
This is a job for a negative lookahead assertion:
[a-zA-Z0-9: \[\].',?"&(/)“”’@-]{600,}\n\n(?!\n)
matches 600 or more of your allowed characters, plus two newlines only if no additional newline can be found after that match.
Upvotes: 2