Reputation: 764

Get all text between each markdown heading

I'm trying to extract the text between the headings in a markdown file. The markdown file will look something like this:

### Description

This is a description

### Changelog

This is my changelog

### Automated Tests added

- Test 1
- Test 2

### Acceptance Tests performed



### Blurb

Concise summary of what this PR is.

Is there anyway I can return all of the groups so that:

group 1 = "This is a description"
group 2 = "This is my changelog"

...and so on

Upvotes: 2

Answers (3)

Luc Gagan

Reputation: 874

You cannot use regex for this because regex has no (reasonable) way of knowing when a "heading like" element is contained in a code block, e.g.

# This is a heading
```
# This is a not a heading
```

"This is a not a heading" is a heading-like element inside a code block.

In order to extract headings you need to use markdown parser, and then use the resulting AST to extract headings.

This could be something as simple as:

import { remark } from 'remark';
import { visit } from 'unist-util-visit';

export type Heading = {
  depth: number;
  title: string;
};

export const extractMarkdownHeadings = async (
  content: string,
): Promise<Heading[]> => {
  const headings: Heading[] = [];

  await remark()
    .use(() => {
      return (root) => {
        visit(root, 'heading', (node) => {
          headings.push({
            depth: node.depth,
            title: 'value' in node.children[0] ? node.children[0].value : 'N/A',
          });
        });
      };
    })
    .process(content);

  return headings;
};

Upvotes: 4

mohsyn

Reputation: 298

Gets all the matches except last one.

regex match pattern:(###[\h\w\t\n.]*)((###)?\n)


check sample here
https://regex101.com/r/eBtSTM/2

Upvotes: 0

Reza Saadati

Reputation: 5429

You can use ^[^#]+. That will exclude a line that starts with #. If you want to have groups, you may use ^([^#]+).

Note that the matches include line breaks. If you don't want them, you can exclude them as well with ^([^#\n]+).

See the result

Upvotes: 2

Get all text between each markdown heading

Answers (3)

Related Questions