Reputation: 81
I am creating a screenshot generator using puppeteer and node js. It works fine for normal web pages, but for pdf pages it always gives the same error everytime I run it
Here's the code(first example from https://github.com/GoogleChrome/puppeteer)
const puppeteer = require('puppeteer');
(async () => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf');
await page.screenshot({ path: 'example.png' });
await browser.close();
} catch (err) {
console.log(err);
}
})();
The error that I get
Error: net::ERR_ABORTED at https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf
at navigate (C:\MEAN\puppeteer-demo\node_modules\puppeteer\lib\FrameManager.js:121:37)
at process._tickCallback (internal/process/next_tick.js:68:7)
-- ASYNC --
at Frame.<anonymous> (C:\MEAN\puppeteer-demo\node_modules\puppeteer\lib\helper.js:110:27)
at Page.goto (C:\MEAN\puppeteer-demo\node_modules\puppeteer\lib\Page.js:629:49)
at Page.<anonymous> (C:\MEAN\puppeteer-demo\node_modules\puppeteer\lib\helper.js:111:23)
at C:\MEAN\puppeteer-demo\index.js:7:20
at process._tickCallback (internal/process/next_tick.js:68:7)
Any help is appreciated. I'm also open to any other possible solutions.
Upvotes: 8
Views: 6665
Reputation: 1045
As @kalana-perera mentioned, @aaditya-chakravarty's solution was low resolution and stretched. Made some modifications to output a full, undistorted image of the PDF's first page.
Using typescript with the latest version of PDF.js.
async function generatePdfPreview(pdfUrl: string) {
const browser = await puppeteer.launch({
headless: "new",
defaultViewport: null,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-web-security",
"--disable-features=IsolateOrigins",
"--disable-site-isolation-trials",
],
});
const page = await browser.newPage();
await page.setContent(
previewCreatorPage(pdfUrl)
);
await page.waitForSelector("#renderingComplete");
await page.waitForNetworkIdle();
const pdfPage = await page.$("#page");
const screenshot = pdfPage!.screenshot({
type: "png",
omitBackground: true,
});
return screenshot;
}
function previewCreatorPage(url: string) {
return `<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<style>
body {
width: 100vw;
height: 100vh;
margin: 0px;
}
#page {
display: flex;
width: 100%;
}
</style>
<title>Document</title>
</head>
<body>
<canvas id="page"></canvas>
<script src="https://mozilla.github.io/pdf.js/build/pdf.js"></script>
<script>
var pdfjsLib = window['pdfjs-dist/build/pdf'];
(async () => {
const pdf = await pdfjsLib.getDocument('${url}').promise;
const page = await pdf.getPage(1);
const viewport = page.getViewport({ scale: 1 });
const canvas = document.getElementById('page');
const context = canvas.getContext('2d');
canvas.height = viewport.height;
canvas.width = viewport.width;
const renderContext = {
canvasContext: context,
viewport: viewport
};
await page.render(renderContext).promise;
const completeElement = document.createElement("span");
completeElement.id = 'renderingComplete';
document.body.append(completeElement);
})();
</script>
</body>
`;
}
defaultViewport: null
will allow larger images than 800x600.width: 100%
and removed height: 100%
#page
) in the page instead of the whole thing.Edit:
Upvotes: 3
Reputation: 123
For anyone stumbling on this question now, I did it by using a combination of Puppeteer, EJS and PDF.js since puppeteer by itself does not view PDF files.
My approach was basically using EJS to dynamically add a URL which will be viewed through PDF.js and then puppeteer will take a screenshot of it.
Here's the JS part
const ejs = require('ejs');
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
args: [
'--disable-web-security',
'--disable-features=IsolateOrigins',
'--disable-site-isolation-trials'
]
});
const page = await browser.newPage();
const url = "https://example.com/test.pdf";
const html = await ejs.renderFile('./template.ejs', { data: { url } });
await page.setContent(html);
await page.waitForNetworkIdle();
const image = await page.screenshot({ encoding: 'base64' });
await browser.close();
console.log('Image: ', image);
})();
I added chromium args in puppeteer launch to allow for no-cors loading of pdf file as per this answer.
Here's the EJS template
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<style>
body {
width: 100vw;
height: 100vh;
margin: 0;
}
#page {
display: flex;
width: 100%;
height: 100%;
}
</style>
<title>Document</title>
</head>
<body>
<canvas id="page"></canvas>
<script src="https://unpkg.com/[email protected]/build/pdf.min.js"></script>
<script>
(async () => {
const pdf = await pdfjsLib.getDocument('<%= data.url %>');
const page = await pdf.getPage(1);
const viewport = page.getViewport(1);
const canvas = document.getElementById('page');
const context = canvas.getContext('2d');
canvas.height = viewport.height;
canvas.width = viewport.width;
const renderContext = {
canvasContext: context,
viewport: viewport
};
page.render(renderContext);
})();
</script>
</body>
</html>
Do note that this code will take a screenshot of only the first page.
Upvotes: 2
Reputation: 141
Chromium does not allow to open pdf files in headless true mode, use instead headless false mode. await puppeteer.launch({args: ['--no-sandbox'], headless: false })
Upvotes: 0
Reputation: 25280
Headless Chrome is not able to visit PDF pages and will throw the error Error: net::ERR_ABORTED
as you are experiencing. Although you can visit a PDF document with headless: false
, taking a screenshot will also fail, as the PDF is not a real website and actually rendered inside a separate view.
What you can do instead, is download the page and use PDF.js to create an image of the page. You might want to check out other information on the topic of "pdf to image" or "pdf preview". There are multiple questions on stackoverflow (1, 2, ..) regarding that topic and also examples on the PDF.js page itself.
Upvotes: 4