Reputation: 11
I am using the Azure.AI.DocumentIntelligence SDK for .NET to analyze a document on my custom model with multiple pages. However, I am encountering an issue where the resulting AnalyzeResult object only contains a single page result, even though the documentation states that the SDK supports analyzing multiple pages.
here is my code:
private async Task<AnalyzeResult> ExtractDocumentAsync(string filePath, string modelId)
{
// Your Form Recognizer endpoint and API key
string endpoint = system.GetAsset("AzureDU_Endpoint","Root/.General").ToString();
string apiKey = system.GetAsset("AzureDU_ApiKey","Root/.General").ToString();
var client = new DocumentIntelligenceClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
// Now you can use the client to interact with the Form Recognizer service
// For example:
try
{
//string documentText = system.ReadTextFile(system.GetResourceForLocalPath(filePath,PathType.File));
//string fileContent = Convert.ToBase64String(Encoding.UTF8.GetBytes(documentText));
byte[] fileContent = File.ReadAllBytes(filePath);
//Uri uriSource = new Uri("file://" + filePath);
var content = new AnalyzeDocumentContent()
{
Base64Source= System.BinaryData.FromBytes(fileContent)
//UrlSource=uriSource
//Base64Source = BinaryData.FromBytes(Encoding.UTF8.GetBytes(fileContent))
};
Operation<AnalyzeResult> operation = await client.AnalyzeDocumentAsync(WaitUntil.Completed, modelId, content,"1,2,3");
AnalyzeResult result = operation.Value;
Console.WriteLine($"Document was analyzed with model with ID: {result.ModelId}");
return result;
}
catch (Exception ex)
{
Console.WriteLine($"An error occurred: {ex.Message}");
throw;
}
}
Upvotes: 1
Views: 682
Reputation: 3614
For documents with multiple pages, each page is represented sequentially within the document. The model provides information about each page, including its orientation angle (indicating if the page is rotated), width, height, and other relevant details.
For PDF and TIFF files, up to 2000 pages can be processed. However, with a free tier subscription, only the first two pages are processed.
steps for multiple pages custom module :
Getting Started with Document Intelligence Studio:
Prerequisites for Custom Projects:
Custom Models:
string endpoint = TestEnvironment.Endpoint;
string apiKey = TestEnvironment.ApiKey;
Uri blobContainerUri = new Uri("<blobContainerUri>");
Uri blobContainerUri = new Uri(TestEnvironment.BlobContainerSasUrl);
var client = new DocumentModelAdministrationClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
BuildDocumentModelOperation operation = await client.BuildDocumentModelAsync(WaitUntil.Completed, blobContainerUri, DocumentBuildMode.Template);
DocumentModelDetails model = operation.Value;
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Created on: {model.CreatedOn}");
Console.WriteLine(" Document types the model can recognize:");
foreach (KeyValuePair<string, DocumentTypeDetails> documentType in model.DocumentTypes)
{
Console.WriteLine($" Document type: {documentType.Key} which has the following fields:");
foreach (KeyValuePair<string, DocumentFieldSchema> schema in documentType.Value.FieldSchema)
{
Console.WriteLine($" Field: {schema.Key} with confidence {documentType.Value.FieldConfidence[schema.Key]}");
}
Custom document models Document Intelligence with Form Recognizer. I used Azure's Form Recognizer service to analyze identity documents of pdf.
Another way to analyze pages of a PDF document using Azure Document Intelligence SDK can be found at SO.
Upvotes: 0