Reputation: 764
I'm trying to extract the text between the headings in a markdown file. The markdown file will look something like this:
### Description
This is a description
### Changelog
This is my changelog
### Automated Tests added
- Test 1
- Test 2
### Acceptance Tests performed
### Blurb
Concise summary of what this PR is.
Is there anyway I can return all of the groups so that:
...and so on
Upvotes: 2
Views: 1114
Reputation: 874
You cannot use regex for this because regex has no (reasonable) way of knowing when a "heading like" element is contained in a code block, e.g.
# This is a heading
```
# This is a not a heading
```
"This is a not a heading" is a heading-like element inside a code block.
In order to extract headings you need to use markdown parser, and then use the resulting AST to extract headings.
This could be something as simple as:
import { remark } from 'remark';
import { visit } from 'unist-util-visit';
export type Heading = {
depth: number;
title: string;
};
export const extractMarkdownHeadings = async (
content: string,
): Promise<Heading[]> => {
const headings: Heading[] = [];
await remark()
.use(() => {
return (root) => {
visit(root, 'heading', (node) => {
headings.push({
depth: node.depth,
title: 'value' in node.children[0] ? node.children[0].value : 'N/A',
});
});
};
})
.process(content);
return headings;
};
Upvotes: 4
Reputation: 298
Gets all the matches except last one.
regex match pattern:(###[\h\w\t\n.]*)((###)?\n)
check sample here
https://regex101.com/r/eBtSTM/2
Upvotes: 0
Reputation: 5429
You can use ^[^#]+
. That will exclude a line that starts with #
. If you want to have groups, you may use ^([^#]+)
.
Note that the matches include line breaks. If you don't want them, you can exclude them as well with ^([^#\n]+)
.
Upvotes: 2