RSid
RSid

Reputation: 788

In Adobe Acrobat 9, how do I use a regex in the JavaScript Console to search the text of a pdf?

In Adobe Acrobat 9, how do I apply a regex to search the text of a pdf and/or index of a series of pdfs?

There are 200 or so keywords that I need to search, and I could do it manually through each index, but I'll have to do this several times for a lot of indexs/pdfs and want to automate as much as possible.

It's easy enough to search the text of a pdf from the JavaScript console, say for the word 'the':

search.query("the","ActiveDoc");

And having a regex interact with a string you've written in the console is no problem either:

var string="I hope this works9867"
var regex=/\d/

if (regex.test(string))
    {app.alert("win",2)
    }

But I can't get a regex to apply to the OCR-ed text of a pdf and have found no guides on how to do so thus far. It seemed logical that either

var regex=/\d/

search.query(regex,"ActiveDoc");

or some close variant on

search.query(/\d/,"ActiveDoc");

would work, but no dice. Is there a way to do this? Ideally the method would work for indexes and pdfs alike.

Upvotes: 1

Views: 5884

Answers (1)

Jesse Good
Jesse Good

Reputation: 52365

You cant use regular expressions with search.query. There are two ways you can make searching easier:

Method #1: Put everything you want to search for in an array and pass that to search.query.

myArray = "stuff you want to search for";
search.query(myArray, "ActiveDoc");

You could also change the way you want to search by doing something like this:

search.wordMatching = "BooleanQuery";
search.matchWholeWord = false;
myArray = "Word1 OR Word2 OR Word3";
search.query(myArray, "Folder", "/c/myDocuments");

For more examples of how to configure search.query, refer to the Adobe Javascript API Reference.

Method #2: Extract the text out of the PDF document and perform a regex search on the string.
The following code loops through the entire document and makes a string of the words on each page and then searches for "Hello" inside the string.

for (var i = 0; i < this.numPages; i++) { // Loop through the entire document
    numWords = this.getPageNumWords(i); // Find out how many words are on the page
    var WordString = ""; // Prepare a string
    for (var j = 0; j < numWords; j++) // Put all the words on the page into a string
    {
        WordString = WordString + " " + this.getPageNthWord(i, j);
    }
    if (WordString.match(/Hello/)) { // Search for the word "Hello" in the string
        search.matchWholeWord = true; // If we got here, we'll search for "Hello" in the document
        search.query(WordString.match(/Hello/), "ActiveDoc");
    }
}

Upvotes: 2

Related Questions