Reputation: 163
I'm having trouble reading a PDF with header and footer but with 2 columns in your body.
I already have the column widths and height of the header but I need the code to read the pages with columns.
Can anyone provide me a piece of code that reads PDF with columns?
thank you
Upvotes: 1
Views: 2963
Reputation: 77528
It's very hard to achieve what you want if you don't know the position of the columns, but I assume that you have its coordinates because you say "I already have the column widths and height". In that case, your question isn't that different from this other question posted on StackOverflow: iTextSharp read from specific position
Suppose that rect
is a Rectangle
corresponding with the position of a column, then you need this code:
RenderFilter[] filter = {new RegionTextRenderFilter(rect)};
ITextExtractionStrategy strategy = new FilteredTextRenderListener(
new LocationTextExtractionStrategy(), filter);
String single_column = PdfTextExtractor.GetTextFromPage(reader, i, strategy));
Now you have the text in a single column. You need to repeat this for every column on your page.
Extra comment: While in most cases using the RegionTextRenderFilter
will work just fine, a few cases (in which columns are created by simply inserting additional space characters in the lines) might require to split the text chunks to process in advance. This can be done e.g. by using the TextRenderInfoSplitter
from this answer and wrapping the FilteredTextRenderListener
in it. (This comment was provided by mkl.)
Upvotes: 1