Ben
Ben

Reputation: 62364

PHP Regex matching paragraph break

I have the following text:

This is some test I want to keep\n
\n
**Header**  \n
Some test text \n
Another test  \n
Something else  \n
\n
 - bullet point i want to keep
 - this bulleted list could be another paragraph or header, etc.

What I'm trying to do is come up with a regex that'll match...

**Header**  \n
Some test text \n
Another test  \n
Something else  \n
\n

...so that I can strip out that segment of contents. It's safe for me to identify this by the header text, in this case Header, but I can't assume the ending is a bulleted list, it could be anything, so I was planning to end the match by 2 newlines.

I've been trying to find a regex that'll do what I need, \*\*Header\*\*(?:.+?)(?:\r*\n){2,} seems to, but I can't get it to match - see https://regex101.com/r/1OUgAV/1/.

Feels like I'm missing something silly here, can someone help me out?

Upvotes: 0

Views: 48

Answers (1)

The fourth bird
The fourth bird

Reputation: 163277

In the text there is \n literally instead of a newline. But the pattern that you tried will only match if the text also ends with 2 newlines. It will not match when the text ends after Something else

You don't need the s modifier to make the dot match a newline. Instead you could match all the lines asserting that at the start of the line, what follows is not a newline.

If you want to match only the header part, you could use

\*\*Header\*\*.*(?:\r?\n(?!\r?\n).*)*

Explanation

  • \*\*Header\*\* Match **Header**
  • .* Match any char except a newline 0+ times
  • (?: Non capture group
    • \r?\n Match a newline
    • (?!\r?\n).* Assert what is directly to the right is not another newline. Then match the whole line.
  • )* Close the group and repeat 0+ times

Regex demo

Upvotes: 1

Related Questions