Reputation: 2297
I am working on one project where there is a functionality need to implement with PDF
I want to read the text of PDF file in my c#.net project.
Can anyone know what is the way to do so?
Upvotes: 3
Views: 22927
Reputation:
I would much like to use getText() method of PdfTextStripper.To implement this, you can have look over following url:
http://naspinski.net/post/ParsingReading-a-PDF-file-with-C-and-AspNet-to-text.aspx
http://www.codeproject.com/Articles/12445/Converting-PDF-to-Text-in-C
Upvotes: 1
Reputation: 5458
Short answer, unless you are generating the pdf and are doing it correctly, no.
Pdf files are generated in a manner similar to what is sent to a printer. Not all text is readable in them, and the information about the text can be stored arbitrarily. Also some programs might save the text in vector or bitmap format.
Upvotes: 0
Reputation: 800
Hve a look to the following links:
How to read pdf files using C# .NET
and
Hopefully they can guide you to the correct direction.
Upvotes: 3
Reputation: 6149
Try this library, very easy to use and exactly what you need:
http://www.codeproject.com/Articles/14170/Extract-Text-from-PDF-in-C-100-NET
Upvotes: 1
Reputation: 1046
Perhaps pdfLib can be used.
From pdfLib homepage
PDFlib TET PDF IFilter (Enterprise PDF Search on Windows) extracts text and metadata from PDF documents and makes it available to search and retrieval software on Windows.
Upvotes: 1