Regex for matching text between two regex-patters

Question

I am looking for a way to capture text and its paragraph title from a text document.

Text File:

paraTitle-1
--------
Lines and words
empty....
more lines



still part of paraTitle-1

paraTitle-2
--------
Lines and words
empty....
more lines



still part of paraTitle-2

I want to capture both the titles and the text below them.

 array = [paraTitle-1: ,
          paraTitle-2: ]

I made a few attempts with pattern (?<=(.*)) ----* (?=(.*)) to no avail. Any guidance would be awesome.

Andreas · Accepted Answer

The following regex will do:

(?!--------\R)(.*)\R--------\R((?:\R?(?!.*\R--------\R).*)+)

See regex101.

The title separator line (--------) can also be specified as -{8}, which is easier to adjust to variable length if needed, e.g. instead of exactly 8 dashes, it could be 6 or more: -{6,}

Explanation:

Capture a line of text (paragraph title):
```
(.*)\R
```
- The . doesn't match line break characters
- \R matches line breaks, including the Windows CRLF pair. If your regex engine doesn't support \R, use ? as a simple alternative.
Make sure the captured text is not the title separator line:
```
(?!--------\R)
```
Skip the mandatory title separator line:
```
--------\R
```
Capture the paragraph text, as a repeating group of lines:
```
((?:xxx)+)
```
A line has an optional leading line break (first line doesn't have one):
```
\R?.*
```
But make sure the line is not the title of the next paragraph, i.e. it's not a line followed by the title separator line.
```
(?!.*\R--------\R)
```

Regex for matching text between two regex-patters

Answers (1)

Related Questions