Reputation: 141
I have the following snippet of markdown:
# Glossary
This guide is aimed to familiarize the users with definitions to relevant DVC
concepts and terminologies which are frequently used.
## Workspace directory
Also abbreviated as workspace, it is the root directory of a project where DVC
is initialized by running `dvc init` command. Therefore, this directory will
contain a `.dvc` directory as well.
## Cache directory
DVC cache is a hidden storage which is found at `.dvc/cache`. This storage is
used to manage different versions of files which are under DVC control. For more
information on cache, please refer to the this
[guide](/doc/commands-reference/config#cache).
I want to split it such that there are there matches which should be:
# Glossary
...
## Workspace directory
...
## Cache directory
...
I tried to match them using regex /#{1,2}\s.+\n{2}[^(#{2}\s)]*/
. My intention was to match the heading first with this part #{1,2}\s.+\n{2}
and then terminate matching when ##\s
is found. But I'm failing with the second part. Can anyone guide me?
Upvotes: 2
Views: 803
Reputation: 11
I know this is an old post but the subject matter remains relevant and I hope someone with more regex knowledge than me will see this comment and provide an update.
I have been using Wiktor's match regex to find headings and the subsequent text before the next heading.
It works well unless there is a h1 (#) header anywhere in the body of the text. If present, it will be “gobbled up” and become part of the previous section since the regex effectively stops when it sees two or more # followed by a space, and "# " doesn't match that criterion.
This will fail:
## header 2
some text
# header 1
some more text
## header 2b
the first match will be:
## header 2
some text
# header 1
some more text
instead of:
## header 2
some text
The assumption seems to be that there is only one h1 (#) header and it is not preceded by any other headings, then I have found no issues.
To be honest this isn't a real issue in practice for me and I only discovered it when trying to understand the regex in regex101.com.
Upvotes: 1
Reputation: 627327
Use split
with /^(?=#+ )/m
regex (demo) or match with match(/^#+ [^#]*(?:#(?!#)[^#]*)*/gm)
(see another demo):
let contents = `# Glossary
This guide is aimed to familiarize the users with definitions to relevant DVC
concepts and terminologies which are frequently used.
## Workspace directory
Also abbreviated as workspace, it is the root directory of a project where DVC
is initialized by running \`dvc init\` command. Therefore, this directory will
contain a \`.dvc\` directory as well.
## Cache directory
DVC cache is a hidden storage which is found at \`.dvc/cache\`. This storage is
used to manage different versions of files which are under DVC control. For more
information on cache, please refer to the this
[guide](/doc/commands-reference/config#cache).`;
console.log(contents.split(/^(?=#+ )/m).filter(Boolean));
console.log(contents.match(/^#+ [^#]*(?:#(?!#)[^#]*)*/gm));
Output:
[
"# Glossary\n\nThis guide is aimed to familiarize the users with definitions to relevant DVC\nconcepts and terminologies which are frequently used.\n\n",
"## Workspace directory\n\nAlso abbreviated as workspace, it is the root directory of a project where DVC\nis initialized by running `dvc init` command. Therefore, this directory will\ncontain a `.dvc` directory as well.\n\n",
"## Cache directory\n\nDVC cache is a hidden storage which is found at `.dvc/cache`. This storage is\nused to manage different versions of files which are under DVC control. For more\ninformation on cache, please refer to the this\n[guide](/doc/commands-reference/config#cache)."
]
Regex #2 (matching) graph:
Upvotes: 2