Reputation: 95
I am trying to write an Azure function to convert pdf to image in Node.js, but not getting success. Writing directly in azure portal. Using out of the box pdf-poppler package. Here sourcepdf and targetimage are my blob containers.
Below is the code,
const pdf = require('pdf-poppler');
const path = require('path');
const fs = require('fs');
const URL = require('url');
const storage = require('azure-storage');
module.exports = async function (context, myBlob) {
context.log(context.bindingData.blobTrigger);
//context.log(context.bindingData.uri);
let file = '/sourcepdf/sample.pdf';
let opts = {
format: 'jpeg',
out_dir: '/targetimage/sample.jpg',
out_prefix: path.baseName(file, path.extname(file)),
page: null
}
pdf.convert(file, opts)
.then(res => {
console.log('Successfully converted');
})
.catch(error => {
console.error(error);
})
//context.log("JavaScript blob trigger function processed blob \n Blob:", context.bindingData.blobTrigger, "\n Blob Size:", myBlob.length, "Bytes");
};
Any suggestions,
Upvotes: 3
Views: 1378
Reputation: 1156
You mention in your bounty you are looking for a function that uploads directly to a blob storage and uses async/await.
To upload directly to blob storage you want to add a blob storage output binding to your function.
Your function.json
file will look something like this:
{
"bindings": [
{
"authLevel": "function",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "res"
},
{
"type": "blob",
"direction": "out",
"name": "outBlob",
"path": "my-container/{rand-guid}.jpg",
"connection": "AzureWebJobsStorage"
}
]
}
This output binding will be available in the function as context.bindings.outBlob
;
To make synchronous methods awaitable in JavaScript, we can use the util.promisify
function, as is recommended by Microsoft in this example.
Finally, to fulfill the requirements we need to read the file to memory with fs
as the pdf-poppler
library does not support saving a file to memory, always saving the output of the function on disk.
I have created an example Azure Function that takes a HTTP POST trigger, processes a single page PDF to an image and saves it to Azure Blob Storage.
const fs = require("fs");
const fsPromises = require("fs/promises");
const util = require("util");
const pdf = require("pdf-poppler");
const os = require("os");
const path = require("path");
// Use async/await pattern as recommended by Microsoft:
// https://learn.microsoft.com/en-us/azure/azure-functions/functions-reference-node?tabs=v2#use-async-and-await
const readFileAsync = util.promisify(fs.readFile);
const writeFileAsync = util.promisify(fs.writeFile);
// Trigger this function with a HTTP POST with a PDF file encoded via form-data
module.exports = async function (context, req) {
context.log("JavaScript HTTP trigger function is processing a request.");
if (!req.body) {
return { status: 400, body: "No PDF file was provided!" };
}
// Create temp directory
const tempPath = await fsPromises.mkdtemp(os.tmpdir() + path.sep);
const pdfLocation = path.join(tempPath, "my-pdf.pdf");
// Save HTTP body for further processing
await writeFileAsync(pdfLocation, req.body, "binary");
// Convert PDF to JPEG
await pdf.convert(pdfLocation, {
format: "jpg",
out_dir: tempPath,
out_prefix: "my-image",
page: 1
});
// Read local file into memory and set as output binding
context.bindings.outBlob = await readFileAsync(path.join(tempPath, "my-image-1.jpg"));
return {
status: 200,
body: "Your PDF file has been converted to a JPEG file and uploaded to Azure Blob Storage."
};
}
Make sure that when deploying to your Azure Functions app you use web deploy, or that environment variable WEBSITE_RUN_FROM_PACKAGE
is set to 0
. Otherwise, your file system will be read-only and the function will fail!
Being able to process multi-page PDF's is an excercise that is left up to the reader.
Upvotes: 0
Reputation: 14324
Below is my work code:
context.log('JavaScript HTTP trigger function processed a request.');
let file = 'D:\\home\\site\\wwwroot\\nodejs.pdf'
let opts = {
format: 'jpeg',
out_dir: path.dirname(file),
out_prefix: path.basename(file, path.extname(file)),
page: null
}
pdf.convert(file, opts)
.then(res => {
console.log('Successfully converted');
})
.catch(error => {
console.error(error);
})
Except this, you could define the output directory and file name prefix, like the out_dir
could be context.executionContext.functionDirectory
and the out_prefix
just be a string like output
. It will create the images under function folder.
Upvotes: 2