Marco Araujo
Marco Araujo

Reputation: 163

iTextSharp - Reading PDF with 2 columns

I'm having trouble reading a PDF with header and footer but with 2 columns in your body.

I already have the column widths and height of the header but I need the code to read the pages with columns.

Can anyone provide me a piece of code that reads PDF with columns?

thank you

Upvotes: 1

Views: 2963

Answers (1)

Bruno Lowagie
Bruno Lowagie

Reputation: 77528

It's very hard to achieve what you want if you don't know the position of the columns, but I assume that you have its coordinates because you say "I already have the column widths and height". In that case, your question isn't that different from this other question posted on StackOverflow: iTextSharp read from specific position

Suppose that rect is a Rectangle corresponding with the position of a column, then you need this code:

RenderFilter[] filter = {new RegionTextRenderFilter(rect)};
ITextExtractionStrategy strategy = new FilteredTextRenderListener(
    new LocationTextExtractionStrategy(), filter);
String single_column = PdfTextExtractor.GetTextFromPage(reader, i, strategy));

Now you have the text in a single column. You need to repeat this for every column on your page.

Extra comment: While in most cases using the RegionTextRenderFilter will work just fine, a few cases (in which columns are created by simply inserting additional space characters in the lines) might require to split the text chunks to process in advance. This can be done e.g. by using the TextRenderInfoSplitter from this answer and wrapping the FilteredTextRenderListener in it. (This comment was provided by mkl.)

Upvotes: 1

Related Questions