How to retrieve a specific part of text from a PDF knowing the respective coordinates?

Question

I need help for a python script that I am writing. It will handle some tasks regarding PDF-s. Now I am trying to retrieve a specific part of a text from a PDF by having its text coordinates and I can't find a way to do it. I have checked different libraries like PyPDF2 and pdfminer but nothing.

The library PyMuPDF, more specifically the module "fitz.py", offers the possibility to do the opposite: by taking a string as a parameter it returns the coordinates of each occurrence of this string from any page of our PDF file.

#fitz.py usage example

doc = fitz.Document("pdf_name .pdf")
page_mupdf = doc.loadPage(0)
areas = page_mupdf.searchFor("text_to_search", hit_max=16)
print(areas)

[Rect(90.0, 145.8567657470703, 142.13255310058594, 156.50209045410156)]

How to retrieve a specific part of text from a PDF knowing the respective coordinates?

Answers (1)

Related Questions