user3663610
user3663610

Reputation: 57

identify paragraph and or page boundaries of extracted text from pdf documents using c#

I am developing application and I need to identify paragraph from pdf.
I need to extract the text and identify paragraph.
Is there any way to extract text and identify paragraph and or page boundaries of extracted text from pdf documents using c#?

Upvotes: 3

Views: 619

Answers (1)

hogarth45
hogarth45

Reputation: 3677

PDFs are a binary format, try using one of these to read it in:
http://www.pdflib.com/
http://sourceforge.net/projects/itextsharp/

Once you have the stream in, you should be able to check for
line breaks/returns (\n/\r) or tabs \t to find new paragraphs.

Upvotes: 1

Related Questions