Neeraj Kulkarni
Neeraj Kulkarni

Reputation: 365

Reading pdf from url with node.js using PDF.js

I'm trying to extract the text of a pdf from the pdf's url. Following the example on the pdf.js website, i understand how to render a pdf on client-side, but I'm running into issues when I do this server-side.

I downloaded the package using npm i pdfjs-dist

I tried the code below as a simple example to load the pdf:

var url = 'https://raw.githubusercontent.com/mozilla/pdf.js/ba2edeae/examples/learning/helloworld.pdf';
var pdfjsLib = require("pdfjs-dist")
var loadingTask = pdfjsLib.getDocument(url);

loadingTask.promise.then(function (pdf) {
    console.log(pdf);
}).catch(function (error){
    console.log(error)
})

But when I run this, I get the following error:

  message: 'The browser/environment lacks native support for critical functionality used by the PDF.js library (e.g. `ReadableStream` and/or `Promise.allSettled`); please use an ES5-compatible build instead.',
  name: 'UnknownErrorException',
  details: 'Error: The browser/environment lacks native support for critical functionality used by the PDF.js library (e.g. `ReadableStream` and/or `Promise.allSettled`); please use an ES5-compatible build instead.'

Any ideas on how to go about doing this? All I'm trying to do is extract the text of a pdf from it's URL. And I'm trying to do this server side using nodejs. Appreciate any input!

Upvotes: 12

Views: 13488

Answers (3)

Abhay Sehgal
Abhay Sehgal

Reputation: 1723

I've also faced the same issue in latest version of pdfjs-dist (2.8.335) while using it in a node js project and as mentioned in other answers that we need to change path to fix this.

But in my case path - pdfjs-dist/es5/build/pdf didn't work.

In latest version it got changed to pdfjs-dist/legacy/build/pdf.js

Upvotes: 13

CrgioPeca88
CrgioPeca88

Reputation: 1041

I had the same problem (The browser/environment lacks native support for critical functionality used by the PDF.js library (e.g. ReadableStream and/or Promise.allSettled); please use an ES5-compatible build instead.) but with Angular 8 so here I leave the solution in case someone needs it:

packaje.json configuration:

  • Angular versión: 8.2.14
  • pdfjs-dist: 2.4.456

component:

import * as pdfjs from 'pdfjs-dist/es5/build/pdf';
import { pdfjsworker } from 'pdfjs-dist/es5/build/pdf.worker.entry';

pdfjs.GlobalWorkerOptions.workerSrc = pdfjsworker;

Upvotes: 14

Neeraj Kulkarni
Neeraj Kulkarni

Reputation: 365

You need to import the es5 build of pdf.js. The code below should work:

var pdfjsLib = require("pdfjs-dist/es5/build/pdf.js");
var url = 'https://raw.githubusercontent.com/mozilla/pdf.js/ba2edeae/examples/learning/helloworld.pdf';
var loadingTask = pdfjsLib.getDocument(url);

loadingTask.promise.then(function (pdf) {
    console.log(pdf);
}).catch(function (error){
    console.log(error)
})

Also check out https://github.com/mozilla/pdf.js/blob/master/examples/node/getinfo.js for a working example with node.js

Upvotes: 14

Related Questions