user3432478
user3432478

Reputation: 95

azure function to convert pdf to image in node.js?

I am trying to write an Azure function to convert pdf to image in Node.js, but not getting success. Writing directly in azure portal. Using out of the box pdf-poppler package. Here sourcepdf and targetimage are my blob containers.

Below is the code,

const pdf = require('pdf-poppler');
const path = require('path');
const fs = require('fs');
const URL = require('url');


const storage = require('azure-storage');


module.exports = async function (context, myBlob) {

context.log(context.bindingData.blobTrigger);
//context.log(context.bindingData.uri);
let file = '/sourcepdf/sample.pdf';

let opts = {
    format: 'jpeg',
    out_dir: '/targetimage/sample.jpg',
    out_prefix: path.baseName(file, path.extname(file)),
    page: null
}
pdf.convert(file, opts)
    .then(res => {
        console.log('Successfully converted');
    })
    .catch(error => {
        console.error(error);
    })

    //context.log("JavaScript blob trigger function processed blob \n Blob:",  context.bindingData.blobTrigger, "\n Blob Size:", myBlob.length, "Bytes");     

};

Any suggestions,

Upvotes: 3

Views: 1378

Answers (2)

Tiamo Idzenga
Tiamo Idzenga

Reputation: 1156

You mention in your bounty you are looking for a function that uploads directly to a blob storage and uses async/await.

To upload directly to blob storage you want to add a blob storage output binding to your function.

Your function.json file will look something like this:

{
  "bindings": [
    {
      "authLevel": "function",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "methods": [
        "post"
      ]
    },
    {
      "type": "http",
      "direction": "out",
      "name": "res"
    },
    {
      "type": "blob",
      "direction": "out",
      "name": "outBlob",
      "path": "my-container/{rand-guid}.jpg",
      "connection": "AzureWebJobsStorage"
    }
  ]
}

This output binding will be available in the function as context.bindings.outBlob;

To make synchronous methods awaitable in JavaScript, we can use the util.promisify function, as is recommended by Microsoft in this example.

Finally, to fulfill the requirements we need to read the file to memory with fs as the pdf-poppler library does not support saving a file to memory, always saving the output of the function on disk.

I have created an example Azure Function that takes a HTTP POST trigger, processes a single page PDF to an image and saves it to Azure Blob Storage.

const fs = require("fs");
const fsPromises = require("fs/promises");
const util = require("util");
const pdf = require("pdf-poppler");
const os = require("os");
const path = require("path");

// Use async/await pattern as recommended by Microsoft:
// https://learn.microsoft.com/en-us/azure/azure-functions/functions-reference-node?tabs=v2#use-async-and-await
const readFileAsync = util.promisify(fs.readFile);
const writeFileAsync = util.promisify(fs.writeFile);

// Trigger this function with a HTTP POST with a PDF file encoded via form-data
module.exports = async function (context, req) {
    context.log("JavaScript HTTP trigger function is processing a request.");

    if (!req.body) {
        return { status: 400, body: "No PDF file was provided!" };
    }

    // Create temp directory
    const tempPath = await fsPromises.mkdtemp(os.tmpdir() + path.sep);
    const pdfLocation = path.join(tempPath, "my-pdf.pdf");

    // Save HTTP body for further processing
    await writeFileAsync(pdfLocation, req.body, "binary");

    // Convert PDF to JPEG
    await pdf.convert(pdfLocation, {
        format: "jpg",
        out_dir: tempPath,
        out_prefix: "my-image",
        page: 1
    });

    // Read local file into memory and set as output binding
    context.bindings.outBlob = await readFileAsync(path.join(tempPath, "my-image-1.jpg"));

    return {
        status: 200,
        body: "Your PDF file has been converted to a JPEG file and uploaded to Azure Blob Storage."
    };
}

Make sure that when deploying to your Azure Functions app you use web deploy, or that environment variable WEBSITE_RUN_FROM_PACKAGE is set to 0. Otherwise, your file system will be read-only and the function will fail!

Being able to process multi-page PDF's is an excercise that is left up to the reader.

Upvotes: 0

George Chen
George Chen

Reputation: 14324

Below is my work code:

context.log('JavaScript HTTP trigger function processed a request.');
    let file = 'D:\\home\\site\\wwwroot\\nodejs.pdf'

    let opts = {
        format: 'jpeg',
        out_dir: path.dirname(file),
        out_prefix: path.basename(file, path.extname(file)),
        page: null
    }

    pdf.convert(file, opts)
        .then(res => {
            console.log('Successfully converted');
        })
        .catch(error => {
            console.error(error);
        })

Except this, you could define the output directory and file name prefix, like the out_dir could be context.executionContext.functionDirectory and the out_prefix just be a string like output. It will create the images under function folder.

enter image description here

Upvotes: 2

Related Questions