Reputation: 57
I am developing application and I need to identify paragraph from pdf.
I need to extract the text and identify paragraph.
Is there any way to extract text and identify paragraph and or page boundaries of extracted text from pdf documents using c#?
Upvotes: 3
Views: 619
Reputation: 3677
PDFs are a binary format, try using one of these to read it in:
http://www.pdflib.com/
http://sourceforge.net/projects/itextsharp/
Once you have the stream in, you should be able to check for
line breaks/returns (\n/\r) or tabs \t to find new paragraphs.
Upvotes: 1