Vvega
Vvega

Reputation: 37

Is there an easy way to find specific text in a PDF, highlight it and print OR save to new file?

So what I'm hoping to do is automate mapping out process of desk locations in a building layout map that is in PDF format.

I work with a deployment team that handles IT equipment requests.. and basically we get requests with a list of user names and their location in the building i.e -floor number and desk location number.

my current routine is to print out a copy of the pdf floor plan for each floor and manually highlight all the desk locations on the map with a pen before i plan out my route for the day based on the request low-high priority.. this can be a bit tedious when we get a large number of requests - And so i was wondering if i can just feed Python the list of desk locations and have it generate a PDF with all the locations already highlighted for me - and possibly adding some additional comments to the page if possible :)

Upvotes: 0

Views: 114

Answers (1)

Schalton
Schalton

Reputation: 3104

Yes, this is possible. I've deployed it for work so cannot share the code.

Three approaches:

1. cv2 template matching (problem is you'll need to setup each desk as a template)

2. pytesseract (for OCR) with a 'guess & check' algorithm that narrows the field and a fuzzy text match to handle the poor OCR quality (this is slow -- will take several minutes per desk).

3. If the desks are numbered logically you could simply create a coordinate dictionary w/ offsets for 'related' desks (this is the fasted, most accurate method)

Upvotes: 1

Related Questions