Reputation: 279
I am trying to use C# FormRecognizer SDK of Azure Cognitive Service. I have pdfs stored in Azure Blob and I need to extract text/tables from these pdf files using C# SDK.
I see that "AnalyzeWithCustomModelAsync" method takes "Stream" as input parameter, where it accepts only "FileStream" type. If I pass "MemoryStream" as input parameter type, I am getting following error:
{"value":{"error":{"code":"UnsupportedMediaType","message":"In case of HTML form data, the multipart request must contain a document with a media type of - 'application/pdf', 'image/jpeg' or 'image/png'."}},"formatters":[],"contentTypes":[],"statusCode":415}
Is there anyway where I can use my blob file directly, without needing to have those files saved locally?
Regards, Madhu
Upvotes: 0
Views: 438
Reputation: 187
The following code snippet works by getting an instance of the blob (into CloudBlockBlob class) and then loading it into the MemoryStream. Once you have that, you can pass that into Form Recognizer to analyze.
List<string> blobsToAnalyze = new List<string>();
// Get latest Form Recognizer training model ID
Guid aiTrainModelId = Guid.Empty;
ModelResult latestModel = await FormRecognizer.GetModelAsync(config, log);
if (latestModel != null)
aiTrainModelId = latestModel.ModelId;
// Iterate through all blobs
foreach (string strBlob in blobsToAnalyze)
{
CloudBlockBlob blob = blobContainer.GetBlockBlobReference(strBlob);
using (MemoryStream ms = new MemoryStream())
{
// Load blob into a MemoryStream object
await blob.DownloadToStreamAsync(ms);
// Send to Form Recognizer to analyze
AnalyzeResult results = await FormRecognizer.AnalyzeFormAsync(config, aiTrainModelId, ms, log);
searchResults = FormRecognizer.AnalyzeResults(config, tableClient, results, log);
}
}
Upvotes: 1