Xavi Andreu
Xavi Andreu

Reputation: 101

How to extract the text of a PDF file using Azure Functions?

I want to create an Azure Function that gets triggered anytime a file is uploaded to blob storage and extracts the text of a PDF file. I don't know what would be the best library to use either.

I found this post that shows how to use PdfSharp to extract the text of a PDF file but I can't seem to get it working since It's my first time using Azure Functions.

Upvotes: 1

Views: 2577

Answers (1)

Rob Reagan
Rob Reagan

Reputation: 7686

This question is overly broad and will probably be closed as such. But here are some pointers.

  1. Start by installing the Azure Storage Emulator so that you can create Blobs locally for testing. Get it here.
  2. Create an Azure Function v2. Set up a Blob Storage Trigger so that whenever something is written to your local storage, the trigger will be called. Blob trigger described here.
  3. Once you can hit a breakpoint in your Azure Function when a Blob is added to your local emulator, you'll need to get the bytes and extract the text using a PDF ripper of your choice. There are many, some are free, and some are paid. Suggesting one and giving code examples could run several thousand words, so it's up to you which one you pick and use.

Upvotes: 1

Related Questions