Madhukar Hiriadka
Madhukar Hiriadka

Reputation: 279

FormRecognizer C# SDK with blob file - unsupported media type error

I am trying to use C# FormRecognizer SDK of Azure Cognitive Service. I have pdfs stored in Azure Blob and I need to extract text/tables from these pdf files using C# SDK.

I see that "AnalyzeWithCustomModelAsync" method takes "Stream" as input parameter, where it accepts only "FileStream" type. If I pass "MemoryStream" as input parameter type, I am getting following error:

{"value":{"error":{"code":"UnsupportedMediaType","message":"In case of HTML form data, the multipart request must contain a document with a media type of - 'application/pdf', 'image/jpeg' or 'image/png'."}},"formatters":[],"contentTypes":[],"statusCode":415}

Is there anyway where I can use my blob file directly, without needing to have those files saved locally?

Regards, Madhu

Upvotes: 0

Views: 438

Answers (1)

TheTurkishDeveloper
TheTurkishDeveloper

Reputation: 187

The following code snippet works by getting an instance of the blob (into CloudBlockBlob class) and then loading it into the MemoryStream. Once you have that, you can pass that into Form Recognizer to analyze.

List<string> blobsToAnalyze = new List<string>();

// Get latest Form Recognizer training model ID
Guid aiTrainModelId = Guid.Empty;
ModelResult latestModel = await FormRecognizer.GetModelAsync(config, log);

if (latestModel != null)
    aiTrainModelId = latestModel.ModelId;

// Iterate through all blobs
foreach (string strBlob in blobsToAnalyze)
{
    CloudBlockBlob blob = blobContainer.GetBlockBlobReference(strBlob);

    using (MemoryStream ms = new MemoryStream())
    {
        // Load blob into a MemoryStream object
        await blob.DownloadToStreamAsync(ms);

        // Send to Form Recognizer to analyze
        AnalyzeResult results = await FormRecognizer.AnalyzeFormAsync(config, aiTrainModelId, ms, log);

        searchResults = FormRecognizer.AnalyzeResults(config, tableClient, results, log);
    }
}

Upvotes: 1

Related Questions