user67348
user67348

Reputation: 41

Search Words in pdf files

Is it possible to search "words" in pdf files with delphi?

I have code with which I can search in many others files like (exe, dll, txt) but it doesn't work with pdf files.

Upvotes: 0

Views: 4769

Answers (6)

Kevin Newman
Kevin Newman

Reputation: 2447

Quick PDF Library's GetPageText function can give you the words from a PDF as well as the page number and the co-ordinates of those words - sometimes useful for highlighting.

Upvotes: 1

Osama Al-Maadeed
Osama Al-Maadeed

Reputation: 5695

One option I have used is to use Microsoft's ifilter technology, this is used by windows desktop search and many other products such as sharepoint and SQL server full-text search.

It supports almost any office/office-like file format, even dwg, msg, pdf, and files in zip/rar archives.

The easiest way to use it is to run FiltDump.exe on any files you have, and index the text output.

To know about the filters installed on your PC, you can use ifilter explorer. Wikipedia has some links on its ifilters page.

Upvotes: 1

Birger
Birger

Reputation: 4353

I'm just working on a project that does this. The method I use is to convert the PDF file to plain text (with pdftotext.exe) and create an index on the resulting text. We do the same with word and other office files, works pretty good!

Searching directly into pdf files from Delphi (without external app) is more difficult I think. If you find anything, please update here as I would also be very interested in that!

Upvotes: 1

Craig Stuntz
Craig Stuntz

Reputation: 126547

The components/libraries mentioned in the answer to this question should do what you need.

Upvotes: 2

StingyJack
StingyJack

Reputation: 19469

It depends on the structure of the specific PDF.

If the pdf is made of images (scanned pages) then you have to OCR each image and build a full text index inside the PDF. (To see if its image based, open it with notepad and look for obj tags full of random chars). There are a few utilities and apps that do this kind of work for you, CVision PDF Compressor is one that I have used before.

If the pdf is a standard PDF, then you should be able to open it like any other text file and search for the words.

Here is page that will detail some of the structure of a PDF. This a SO post for the same.

Upvotes: 2

dirkgently
dirkgently

Reputation: 111130

PDF is not just a binary representation. Think of it as a tree of objects, where an object node has some metadata and some content information. Some of these objects have string data, some don't. Some of these are even encrypted, and some are compressed. So, there's very little chance your string finder will work on any arbitrary PDF.

Upvotes: 0

Related Questions