Google App Scripts regex body.findText(searchPattern) returns null if new lines

Question

Here's a Google Document content:

Some text, more text...


some lines
more lines...


And more text

I would like a regex to match:

...

So far it returns null. My regex only finds

...

, but not with new lines, although I am using the (?s) tag suggested to ensure that . includes any character and new lines:

(?s)

(.)*?

My regex works in https://regexr.com/ and https://regex101.com/, so I don't understand it doesn't in Google App Script.

Tanaike · Accepted Answer

You want to retrieve the text of
....
You want to achieve this using Google Apps Script.

If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.

Issue and workaround:

In your case, you want to use the pattern of

[\s\S]*?

, please modify to [\s\S]*?. In your case, .... has several paragraphs. (From your sample value, I thought like this.) By this, when the pattern of

const searchPattern = '[\s\S]*?
'

is used for body.findText(searchPattern), null is returned. If .... is put as one paragraph, body.findText(searchPattern) returns .....

In order to search

....

which has several paragraphs, how about the following workaround? The flow of this workaround is as follows.

Flow:

Use
and

 as patterns for searching.
Using the pattern of 
, retrieve the begin paragraph of 
.

Using the pattern of 
, retrieve the end paragraph of .
Retrieve the texts between the retrieved begin and end paragraph.
This cycle is continued until all 
.... values are searched.



Sample script:

function parseLists(body) {
  // var doc = DocumentApp.getActiveDocument();
  // var body = doc.getBody();

  var pattern1 = "";
  var range1 = body.findText(pattern1);
  var res = [];
  while (range1) {
    var temp = {};
    var p1 = range1.getElement().getParent();
    temp.startIndex = body.getChildIndex(p1);
    var range2;
    if (p1) {
      range2 = body.findText(pattern2, range1);
      var p2 = range2.getElement().getParent();
      temp.endIndex = body.getChildIndex(p2) + 1;
      var texts = "";
//      for (var i = temp.startIndex + 1; i < temp.endIndex - 1; i++) {
      for (var i = temp.startIndex; i < temp.endIndex; i++) {
        texts += body.getChild(i).asParagraph().getText();
      }
      temp.texts = texts;
      res.push(temp);
    }
    range1 = body.findText(pattern1, range2);
  }
  Logger.log(res)
}


Result:

When your sample values are put to new Google Document and run the script, the following result is retrieved.

[
  {
    "startIndex": 0,
    "endIndex": 5,
    "texts": "{{company_name}},  {{job_location}} — {{job_title}}MONTH {{from}} - {{to}}{{description}}"
  },
  {
    "startIndex": 6,
    "endIndex": 9,
    "texts": "{{test}}"
  }
]



For above result, if you want to retrieve the values of {{company_name}},  {{job_location}} — {{job_title}}MONTH {{from}} - {{to}}{{description}} and {{test}} without the tags, please modify above script as follows.


From:

for (var i = temp.startIndex; i < temp.endIndex; i++) {

To:

for (var i = temp.startIndex + 1; i < temp.endIndex - 1; i++) {




References:


findText(searchPattern, from)
getChildIndex(child)
getParent()


If I misunderstood your question and this was not the direction you want, I apologize.

Google App Scripts regex body.findText(searchPattern) returns null if new lines

Answers (1)

Issue and workaround:

Flow:

Sample script:

Result:

References:

Related Questions