How do I replace or mask text in in pdf analyzed/made searchable by Microsoft Azure Document Intelligence?

Question

MS Azure lets you create searchable PDF, as documented here.

The code is as follows,

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeOutputOption, AnalyzeResult

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]

document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))

with open(path_to_sample_documents, "rb") as f:
    poller = document_intelligence_client.begin_analyze_document(
        "prebuilt-read",
        body=f,
        output=[AnalyzeOutputOption.PDF],
    )
result: AnalyzeResult = poller.result()
operation_id = poller.details["operation_id"]

response = document_intelligence_client.get_analyze_result_pdf(model_id=result.model_id, result_id=operation_id)
with open("analyze_result.pdf", "wb") as writer:
    writer.writelines(response)

The "result" in the code above has the text (and decoration), and the response has completed PDF, that is written to disk at the last line.

My problem is, after analyzing the text in the "result" I need to replace or mask some text, and save the PDF file keeping the original structure. How can I solve it?

How do I replace or mask text in in pdf analyzed/made searchable by Microsoft Azure Document Intelligence?

Answers (1)

Related Questions