Reputation: 9668
I want to split a markdown text like the following first to parts with a heading and then to sentences.
# Heading
some text including multiple sentences...
## another heading
some text including multiple sentences....
## ...
Into :
# Heading
sent1
-----
sent2
-----
....
----
## another heading
sent1
----
sent2
----
....
----
## ...
It's what I tried:
var HReg = new RegExp(/^(#{1,6}\s)(.*)/, 'gm');
var SentReg = new RegExp(/\b(\w\.\w\.)|([.?!])\s+(?=[A-Za-z])/, 'g');
var res1 = text.replace(HReg, function (m, g1, g2) {
return g1 + g2 + "\r";
});
result = res1.replace(SentReg, function (m, g1, g2) {
return g1 ? g1 : g2 + "\r"; // it's for ignoring abbreviations.
});
arr = result.split('\r');
But it separates some headings from their first sentence or include another heading to its previous sentence.
Upvotes: 0
Views: 1096
Reputation: 7378
This is by no means the best option (a proper parser is recommended), but here is a Regex which will serve good enough as a POC:
var s = `# Heading
some text, including multiple sentences. some text including multiple sentences! some text including multiple sentences?
## another heading
some text including multiple sentences. some text including multiple sentences! some text including multiple sentences?
## ABC
some text including multiple sentences. some text including multiple sentences! some text including multiple sentences?
`;
var result = s.match(/(#+.*)|([^!?;.\n]+.)/g).map(v=>v.trim())
0: "# Heading"
1: "some text, including multiple sentences."
2: "some text including multiple sentences!"
3: "some text including multiple sentences?"
4: "## another heading"
5: "some text including multiple sentences."
6: "some text including multiple sentences!"
7: "some text including multiple sentences?"
8: "## ABC"
9: "some text including multiple sentences."
10: "some text including multiple sentences!"
11: "some text including multiple sentences?"
You can remove ;
from between [ ]
if you want to include that as part of a sentence block. This of course does not protect you from anyone who decides not to use punctuation. ;)
Upvotes: 1