cis
cis

Reputation: 1377

Check if base64 string contains a valid PDF - and nothing else

In my web application, users may only upload images and PDFs. These come as base64 strings from the front to the backend. There, on the Node.js 8.9 server, I want to do some sanity checking, i.e. test whether the base64 strings I get are actually just images and PDFs - and nothing else.

For images, that was easy. Using the sharp npm-module with failOnError true, gave me exactly what I wanted: One wrong char in the base64 string would cause a failure and the input would be rejected.

However, for PDFs I cannot find a similar solution. I've tried pdf2json (which seemed overpowered for my requirement anyway), but failed at passing base64 strings via converting to a buffer.

Upvotes: 1

Views: 4854

Answers (1)

cis
cis

Reputation: 1377

I finally found an NPM module that does exactly what I expect: hummusJS. The code below works as far as my tests go: Valid PDFs are accepted, while invalid strings are rejected. Didn't notice any performance impacts so far.

var hummus = require('hummus');

let pdfBase64String = '<<base64 string here>>';
let bufferPdf;
try {
  bufferPdf = Buffer.from(pdfBase64String, 'base64');
  const pdfReader = hummus.createReader(new hummus.PDFRStreamForBuffer(bufferPdf));
  var pages = pdfReader.getPagesCount();
  if(pages > 0) {
      console.log("Parsable with Hummus and more than 0 pages. Seems to be a valid PDF!");
  }
  else {
      console.log("Unexpected outcome for number o pages: '" + pages + "'");
  }
}
catch(err) {
   console.log("ERROR while handling buffer of pdfBase64 and/or trying to parse PDF: " + err);
}

Upvotes: 3

Related Questions