amit patel
amit patel

Reputation: 2297

How to read Text from pdf file in c#.net web application

I am working on one project where there is a functionality need to implement with PDF

I want to read the text of PDF file in my c#.net project.

Can anyone know what is the way to do so?

Upvotes: 3

Views: 22927

Answers (5)

user1082916
user1082916

Reputation:

I would much like to use getText() method of PdfTextStripper.To implement this, you can have look over following url:

http://naspinski.net/post/ParsingReading-a-PDF-file-with-C-and-AspNet-to-text.aspx

http://www.codeproject.com/Articles/12445/Converting-PDF-to-Text-in-C

Upvotes: 1

linkerro
linkerro

Reputation: 5458

Short answer, unless you are generating the pdf and are doing it correctly, no.

Pdf files are generated in a manner similar to what is sent to a printer. Not all text is readable in them, and the information about the text can be stored arbitrarily. Also some programs might save the text in vector or bitmap format.

Upvotes: 0

Hve a look to the following links:

How to read pdf files using C# .NET

and

Reading PDF in C#

Hopefully they can guide you to the correct direction.

Upvotes: 3

Alex
Alex

Reputation: 6149

Try this library, very easy to use and exactly what you need:

http://www.codeproject.com/Articles/14170/Extract-Text-from-PDF-in-C-100-NET

Upvotes: 1

Niels
Niels

Reputation: 1046

Perhaps pdfLib can be used.

From pdfLib homepage

PDFlib TET PDF IFilter (Enterprise PDF Search on Windows) extracts text and metadata from PDF documents and makes it available to search and retrieval software on Windows.

Upvotes: 1

Related Questions