Reputation: 2579
Here's a Google Document content:
Some text, more text...
<li>
some lines
more lines...
</li>
And more text
I would like a regex
to match:
<li>
...
</li>
So far it returns null
. My regex only finds <li>...</li>
, but not with new lines, although I am using the (?s)
tag suggested to ensure that .
includes any character and new lines:
(?s)<li>(.)*?</li>
My regex works in https://regexr.com/ and https://regex101.com/, so I don't understand it doesn't in Google App Script.
Upvotes: 0
Views: 429
Reputation: 201613
<li ...>....</li>
in Google Document.If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
In your case, you want to use the pattern of <li sheet="[a-zA-Z0-9]*">[\s\S]*?<\/li>
, please modify to <li sheet="[a-zA-Z0-9]*">[\\s\\S]*?<\/li>
. In your case, <li ...>....</li>
has several paragraphs. (From your sample value, I thought like this.) By this, when the pattern of const searchPattern = '<li sheet="[a-zA-Z0-9]*">[\\s\\S]*?<\/li>'
is used for body.findText(searchPattern)
, null
is returned. If <li ...>....</li>
is put as one paragraph, body.findText(searchPattern)
returns <li ...>....</li>
.
In order to search <li ...>....</li>
which has several paragraphs, how about the following workaround? The flow of this workaround is as follows.
<li sheet=
and <\/li>
as patterns for searching.<li sheet=
, retrieve the begin paragraph of <li ...>
.<\/li>
, retrieve the end paragraph of </li>
.<li ...>....</li>
values are searched.function parseLists(body) {
// var doc = DocumentApp.getActiveDocument();
// var body = doc.getBody();
var pattern1 = "<li sheet=";
var pattern2 = "<\/li>";
var range1 = body.findText(pattern1);
var res = [];
while (range1) {
var temp = {};
var p1 = range1.getElement().getParent();
temp.startIndex = body.getChildIndex(p1);
var range2;
if (p1) {
range2 = body.findText(pattern2, range1);
var p2 = range2.getElement().getParent();
temp.endIndex = body.getChildIndex(p2) + 1;
var texts = "";
// for (var i = temp.startIndex + 1; i < temp.endIndex - 1; i++) {
for (var i = temp.startIndex; i < temp.endIndex; i++) {
texts += body.getChild(i).asParagraph().getText();
}
temp.texts = texts;
res.push(temp);
}
range1 = body.findText(pattern1, range2);
}
Logger.log(res)
}
When your sample values are put to new Google Document and run the script, the following result is retrieved.
[
{
"startIndex": 0,
"endIndex": 5,
"texts": "<li sheet=\"experiences\">{{company_name}}, {{job_location}} — {{job_title}}MONTH {{from}} - {{to}}{{description}}</li>"
},
{
"startIndex": 6,
"endIndex": 9,
"texts": "<li sheet=\"other\">{{test}}</li>"
}
]
For above result, if you want to retrieve the values of {{company_name}}, {{job_location}} — {{job_title}}MONTH {{from}} - {{to}}{{description}}
and {{test}}
without the tags, please modify above script as follows.
From:
for (var i = temp.startIndex; i < temp.endIndex; i++) {
To:
for (var i = temp.startIndex + 1; i < temp.endIndex - 1; i++) {
If I misunderstood your question and this was not the direction you want, I apologize.
Upvotes: 2