mybecks
mybecks

Reputation: 2473

Parse nested markdown list with regexp

I want to parse some nested markdwon lists, like below:

* elem 1
* elem 2
  * child 1
  * child 2
    * child 1
* elem 3
  * child 1

The list nesting are tabbed. So each level has n tabs. I'm searching for a regex which can give me each level, e.g. Level 3 has \t\t, Level 2 has only \t, Level 1 has no tab, but all starting with *.

How can I match theses requires with different regexp?

One try for the Level 1 elements was:

^(?=\*).*

But this selects only the first element of Level 1 (e.g. elem 2 and elem 3 are not found).

BR,

mybecks

Upvotes: 1

Views: 2242

Answers (3)

gwillie
gwillie

Reputation: 1899

If I understand you correctly you want this:

/^\*.*?(?=^\*|\Z)/sm

Basically it means match from beginning of line, match literally * then anything non-greedily up to the but not including the next ^\* or EOF

EDIT:

This wont work for you, as javascript doesn't support \Z, oops had the wrong regex engine flavour enabled, will update shortly :)

EDIT 2:

This should work in javascript:

^\*[^]+?(?=^\*)|^\*[^]+

Had to use an alternation for the very last element ie if you remove |^\*[^]+ from the end of the regex it wont match the last element :(.

Upvotes: 1

Tibos
Tibos

Reputation: 27843

Here is a function that returns a regexp (based on yours) for matching all the elements on a certain level:

function getNestedRegexp(level) {
  return new RegExp('^(?=\\t{'+level+'}\\*).*','gm');
}

// Usage:
var elements = str.match(getNestedRegexp(1)); // all elements on level 1

DEMO: http://jsbin.com/EcAKIza/1/edit

As others have mentioned, regexp may not be the best solution here, so be careful if you pick this option.

EDIT: I am not sure why you are using a positive lookahead there. A better regexp could be:

/^\t{N}\*.*/gm

DEMO & EXPLANATION: http://regex101.com/r/rZ7mD1

Upvotes: 1

anubhava
anubhava

Reputation: 785856

I believe you can use:

/^\s+\* (.+)$/gm

Upvotes: 1

Related Questions