identify paragraph and or page boundaries of extracted text from pdf documents using c#

Question

I am developing application and I need to identify paragraph from pdf.
I need to extract the text and identify paragraph.
Is there any way to extract text and identify paragraph and or page boundaries of extracted text from pdf documents using c#?

hogarth45 · Accepted Answer

PDFs are a binary format, try using one of these to read it in:
http://www.pdflib.com/
http://sourceforge.net/projects/itextsharp/

Once you have the stream in, you should be able to check for
line breaks/returns ( / ) or tabs to find new paragraphs.

identify paragraph and or page boundaries of extracted text from pdf documents using c#

Answers (1)

Related Questions