desi
desi

Reputation: 833

Parsing pdf files

I have a requirement to split a large pdf document into smaller files based on the content of the file. We use BCL easyPDF to manipulate pdf files. easyPDF can split pdf documents based on a page number, but it cannot split the document based on the file content. Also it does not have a search function (as far as I can tell, if I am wrong please someone let me know.) to determine the location of the content.

Now can someone tell me how I can find the location of text in a pdf file using .net?

Thanks

Upvotes: 10

Views: 29210

Answers (3)

Bobrovsky
Bobrovsky

Reputation: 14246

You might try Docotic.Pdf library for your task.

The library can extract text from PDFs (with or without formatting).

Or you could just retrieve a collection of words with their bounding rectangles from PDFs. This should help you to find location of the text in a file.

Disclaimer: I work for the vendor of the library.

Upvotes: 3

Pablo Santa Cruz
Pablo Santa Cruz

Reputation: 181430

You need a PDF library in .NET such as iText.Net.

Upvotes: 2

Brian
Brian

Reputation: 2229

take a look at this question. there are links to some libraries that may satisfy your requirements

How to programatically search a PDF document in c#

Upvotes: 1

Related Questions