AlexanderJ
AlexanderJ

Reputation: 115

Most practical way to read an Azure Blob (PDF) in the Cloud?

I'm somewhat of a beginner and have never dealt with cloud-based solutions yet before.

My program uses the PDFBox library to extract data from PDFs and rename the file based on the data. It's all local currently, but eventually will need to be deployed as an Azure Function. The PDFs will be stored in an Azure Blob Container - the Azure Blob Storage trigger for Azure Functions is an important reason for this choice.

Of course I can download the blob locally and read it, but the program should run solely in the Cloud. I've tried reading the blobs directly using Java, but this resulted in gibberish data and wasn't compatible with PDFbox. My plan for now is to temp store the files elsewhere in the Cloud (e.g. OneDrive, Azure File Storage) and try opening them from there. However, this seems like it can quickly turn into an overly messy solution. My questions:

(1) Is there any way a blob can be opened as a File, rather than a CloudBlockBlob so this additional step isn't needed?

(2) If no, what would be a recommended temporary storage be in this case?

(3) Are there any alternative ways to approach this issue?

Upvotes: 0

Views: 792

Answers (1)

krishg
krishg

Reputation: 6508

Since you are planning Azure function, you can use blob trigger/binding to get the bytes directly. Then you can use PDFBox PdfDocument load method to directly build the object PDDocument.load(content). You won't need any temporary storage to store the file to load that.

@FunctionName("blobprocessor")
public void run(
  @BlobTrigger(name = "file",
               dataType = "binary",
               path = "myblob/{name}",
               connection = "MyStorageAccountAppSetting") byte[] content,
  @BindingName("name") String filename,
  final ExecutionContext context
) {
  context.getLogger().info("Name: " + filename + " Size: " + content.length + " bytes");
  PDDocument doc = PDDocument.load(content);
  // do your stuffs
}

Upvotes: 2

Related Questions