user421125
user421125

Reputation:

Problem with PdfTextExtractor in itext!

first excuse me for my bad english! I want to search in pdf document for a word like "Hello" . So I must read each page in pdf by PdfTextExtractor. I did it well. I can read all words in each page separately an save it in string buffer. but when i push this code in For loop ,(for example from page 1 to 7 for search in it) earlier page's words will remain in string buffer.I hop you understand my problem. Tanx all. this is my code :

        PdfReader reader2 = new PdfReader(openFileDialog1.FileName);
        int pagen = reader2.NumberOfPages;
        reader2.Close();
        ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
        for (int i = 1; i < pagen; i++)
        {
            textBox1.Text = "";
            PdfReader reader = new PdfReader(openFileDialog1.FileName);

            String  s = PdfTextExtractor.GetTextFromPage(reader, i, its);
            //MessageBox.Show(s.Length.ToString());
            //PdfTextArray h = new PdfTextArray(s);

            //
            // s = "";
            s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
            textBox1.Text = s;
            reader.Close();

}

Upvotes: 2

Views: 11933

Answers (3)

krazy
krazy

Reputation: 21

There is another potential problem in the statement which controls your loop:

for (int i = 1; i < pagen; i++)

If pagen = 1, the loop is not executed at all. It should read:

for (int i = 1; i <= pagen; i++)

Upvotes: 2

ShravankumarKumar
ShravankumarKumar

Reputation: 2035

public string ReadPdfFile(object Filename,DataTable ReadLibray)
    {
     PdfReader reader2 = new PdfReader((string)Filename);
     string strText = string.Empty;

     for (int page = 1; page <= reader2.NumberOfPages; page++)
     {
         ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();       
         PdfReader reader = new PdfReader((string)Filename);  
         String  s = PdfTextExtractor.GetTextFromPage(reader, page, its);

         s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
         strText = strText + s;
         reader.Close(); 
      }
      return strText;
    }

This Code is very HelpFull to read PDf using itext

Upvotes: 0

Mark Storer
Mark Storer

Reputation: 15868

SimpleTextExtractionStrategy doesn't let you reset it unfortunately, so you must move your "new SimpleTextExtractionStrategy()" inside the loop instead of reusing the same object.

Upvotes: 5

Related Questions