Read columns of PDF in C# using ItextSharp

Question

In my progam I extracted text from a PDF file and it works well. ItextSharp extracts text from PDF line by line. However, when a PDF file contains 2 columns, the extracted text is not ok as in each line joins two columns.

My problem is: How can I extract text column by column?

Below is my code. PDF files are Arabic. I'm sorry my English is not so good.

PdfReader reader = new PdfReader(@"D:	est pdf\Blood Journal.pdf");
int intPageNum = reader.NumberOfPages;
string[] words;
string line;

for (int i = 1; i <= intPageNum; i++)
{
    text = PdfTextExtractor.GetTextFromPage(reader, i, 
               new LocationTextExtractionStrategy());

    words = text.Split('
');
    for (int j = 0, len = words.Length; j < len; j++)
    {
        line = Encoding.UTF8.GetString(Encoding.UTF8.GetBytes(words[j]));
        // other things here
    }

    // other things here
}

Read columns of PDF in C# using ItextSharp

Answers (1)

Related Questions