Reputation: 663
I have giant string (markdown) that contains something like this:
## Header 1
{~1.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~2.0} vitae congue erat accumsan nec. {~3.0}
{~4.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~5.0} vitae congue erat accumsan nec. {~6.0}
{~7.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~8.0} vitae congue erat accumsan nec. {~9.0}
## Header 2
{~10.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~11.0} vitae congue erat accumsan nec. {~12.0}
{~113.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~14.0} vitae congue erat accumsan nec. {~15.0}
{~16.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~17.0} vitae congue erat accumsan nec. {~18.0}
## Header 3
{~19.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~20.0} vitae congue erat accumsan nec. {~21.0}
{~22.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~23.0} vitae congue erat accumsan nec. {~24.0}
{~25.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~26.0} vitae congue erat accumsan nec. {~27.0}
This is a marker {~x.x}
And I will call "section" to the combination of a header and one more more paragraphs.
I need to match the first and the last marker of every section.
Currently I'm using this regex /\s?{([^}]*(~\d*(?:\.\d+)?)[^}]*)}\s?/g
in javascript that I got from the selected answer of this question to capture all the markers, but now I need to modify it to capture only the first and the last ones from every 'section'.
The string comes from user input so I cannot know in advance how many paragraphs a 'section' will have neither the content of the headers, all that I know is that there will be at least one section (meaning one header followed by x amount of paragraphs).
Upvotes: 1
Views: 81
Reputation: 10783
This is possible with lookarounds, which JS supports.
Since we're reusing the original pattern a lot, let's store it in a variable:
const pattern = String.raw`{([^}]*(?:~\d*(?:\.\d+)?)[^}]*)}`;
A string that doesn't contain the pattern above looks like this, where [^]
denotes "all character", similar to a .
with the s
flag:
`(?:(?!${pattern})[^])*`
From that, we construct our lookahead and lookbehind:
// Pattern, anything that doesn't contain pattern, then header or end of string (not end of line).
const lookahead = `${pattern}(?=(?:(?!${pattern})[^])*(?:^##.+|(?![^])))`;
// Header, anything that doesn't contain pattern, then pattern itself.
const lookbehind = `(?<=^##.+$(?:(?!${pattern})[^])*)${pattern}`;
Here's how our final steps go:
const regex = new RegExp(`${lookbehind}|${lookahead}`, 'gm');
// Filter out unmatched groups.
[...text.matchAll(regex)].map(match => match.filter(Boolean));
Try it:
console.config({ maximize: true });
function match(string) {
const pattern = String.raw`{([^}]*(?:~\d*(?:\.\d+)?)[^}]*)}`;
const lookahead = `${pattern}(?=(?:(?!${pattern})[^])*(?:^##.+|(?![^])))`;
const lookbehind = `(?<=^##.+$(?:(?!${pattern})[^])*)${pattern}`;
const regex = new RegExp(`${lookbehind}|${lookahead}`, 'gm');
console.log(regex); // Just to show you how monstrous it is.
return string.matchAll(regex);
}
const text = `
## Header 1
{~1.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~2.0} vitae congue erat accumsan nec. {~3.0}
{~4.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~5.0} vitae congue erat accumsan nec. {~6.0}
{~7.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~8.0} vitae congue erat accumsan nec. {~9.0}
## Header 2
{~10.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~11.0} vitae congue erat accumsan nec. {~12.0}
{~113.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~14.0} vitae congue erat accumsan nec. {~15.0}
{~16.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~17.0} vitae congue erat accumsan nec. {~18.0}
## Header 3
{~19.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~20.0} vitae congue erat accumsan nec. {~21.0}
{~22.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~23.0} vitae congue erat accumsan nec. {~24.0}
{~25.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~26.0} vitae congue erat accumsan nec. {~27.0}
`.trim();
console.log([...match(text)].map(match => match.filter(Boolean)));
<script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>
Upvotes: 0
Reputation: 1440
This is my variant, less regexp:y than most others perhaps, but it works:
function getNumbers(str) {
return `\n${str}`.split('\n## ')
.map(x => [...x.matchAll(/\{~(\d|\.)*\}/g)].map(x => x[0]))
.map(x => [x[0], x.slice(-1)]).flat(2).filter(x => x)
.map(x => +x.replace(/[\{\}~]/g, ''));
}
Upvotes: 1