ashokkumar ponnnusamy
ashokkumar ponnnusamy

Reputation: 75

Count total number of pages in pdf file

Every week, I'll be receiving a set of pdf files from my clients.

They will paste the pdf files in the specific google drive folder. I need a total number of pages of the pdf file. I was trying to create a code in Apps script which will helps to update the pdf file name and the total number of pages in the particular Google sheet.

I found the code which was created for the google docs here and here.

But that doesn't work. I am looking for a Apps script which helps to check the particular drive folder and update the pdf file name and the total number of pages in the specific google sheet.

I have tried to below script.

function getNumberofPages() {
  var myFolder = DriveApp.getFoldersByName("Test").next();
  var files = myFolder.searchFiles('title contains ".PDF"');
  while (files.hasNext()) {
    var file = files.next();
    Logger.log(file.getName());
    Logger.log(file.length);
  }
}

But the length option is not working of pdf file....

Upvotes: 0

Views: 5274

Answers (2)

Tanaike
Tanaike

Reputation: 201593

Unfortunately, there are no methods for directly retrieving the total pages from a PDF file using Google APIs yet. So how about these workarounds? Please choose it for your situation.

Workaround 1:

In this workaround, it retrieves the number of content streams in the PDF file. The content streams is shown as the attribute of /Contents.

When this is reflected to your script, it becomes as follows.

Modified script:

function getNumberofPages() {
  var myFolder = DriveApp.getFoldersByName("Test").next();
  var files = myFolder.searchFiles('title contains ".PDF"');
  while (files.hasNext()) {
    var file = files.next();
    var n = file.getBlob().getDataAsString().split("/Contents").length - 1;
    Logger.log("fileName: %s, totalPages: %s", file.getName(), n)
  }
}
  • Although this workaround is simple, it might be able to not use for all PDF files as @mkl says. If this workaround cannot be used for your PDF files, how about the following workaround 2?

Workaround 2:

In this workaround, an API is used for retrieving the total pages of PDF file. I used Split PDF API. The total pages are retrieved from the number of split files. When you use this API, please check ConvertAPI and retrieve your secret key.

Modified script:

function getNumberofPages() {
  var myFolder = DriveApp.getFoldersByName("Test").next();
  var files = myFolder.searchFiles('title contains ".PDF"');
  while (files.hasNext()) {
    var file = files.next();
    var url = "https://v2.convertapi.com/convert/pdf/to/split?Secret=#####"; // Please set your secret key.
    var options = {
      method: "post",
      payload: {File: DriveApp.getFileById(file.getId()).getBlob()},
    }
    var res = UrlFetchApp.fetch(url, options);
    res = JSON.parse(res.getContentText());
    Logger.log("fileName: %s, totalPages: %s", file.getName(), res.Files.length)
  }
}
  • I'm not sure about the number of PDF files and file size. So I didn't use fetchAll method for this. This is a sample script. So please modify this for your situation.

Note:

  • I can use these workarounds in my applications. But I have not been able to confirm for all PDF files. So if these workarounds didn't work for your PDF files, I'm sorry.

Reference:

Workaround 3:

As another approach, when this method is used, the sample script for retrieving the number of pages of PDF data is as follows.

async function myFunction() {
  const cdnjs = "https://cdn.jsdelivr.net/npm/pdf-lib/dist/pdf-lib.min.js";
  eval(UrlFetchApp.fetch(cdnjs).getContentText()); // Load pdf-lib
  const setTimeout = function (f, t) {
    // Overwrite setTimeout with Google Apps Script.
    Utilities.sleep(t);
    return f();
  };

  const myFolder = DriveApp.getFoldersByName("Test").next();
  const files = myFolder.searchFiles('title contains ".PDF"');
  const ar = [];
  while (files.hasNext()) {
    ar.push(files.next())
  }
  for (let i = 0; i < ar.length; i++) {
    const file = ar[i];
    const pdfData = await PDFLib.PDFDocument.load(new Uint8Array(file.getBlob().getBytes()));
    const n = pdfData.getPageCount();
    console.log("fileName: %s, totalPages: %s", file.getName(), n);
  }
}

Note:

Upvotes: 5

ashokkumar ponnnusamy
ashokkumar ponnnusamy

Reputation: 75

function menuItem() {
var folder = 
DriveApp.getFoldersByName('Test').next();
var contents = folder.searchFiles('title contains ".PDF"');
var file;
var name;
var sheet = SpreadsheetApp.getActiveSheet();
var count;

sheet.clear();
sheet.appendRow(["Name", "Number of pages"]);

while(contents.hasNext()) {
file = contents.next();
name = file.getName();
count = 
file.getBlob().getDataAsString().split("/Contents").length - 1;

data = [name, count]
sheet.appendRow(data);
}
};


function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu('PDF Page Calculator')
          .addItem("PDF Page Calculator", 
'menuItem')
.addToUi();
};

Upvotes: -2

Related Questions