itextsharp extract text pdf not working

Question

I'm having trouble getting the text from the page.

Object reference error not set to an instance of an object, in the bold line.

String extractText = PdfTextExtractor.GetTextFromPage(pdfReader, i);

Follow the code below

 var pdfText = new StringBuilder();
 using (var pdfReader = new PdfReader(cbPdf.SelectedValue + ""))
 {
      for (var i = 0; i <= pdfReader.NumberOfPages; i++)
      {
         String extractText = PdfTextExtractor.GetTextFromPage(pdfReader, i);
         extractText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(extractText)));
         pdfText.Append(extractText);
      }
 }
 rtxtTexto.Text = pdfText.ToString();

mkl · Accepted Answer

iText numbers pages 1-based, i.e. the first page has number 1.

You already did take that into account at the end of your loop (by comparing using <=), merely not at the start (where you start at 0).

Thus,

for (var i = 1; i <= pdfReader.NumberOfPages; i++)

That being said, as far as I know your line

extractText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(extractText)));

is nonsense.

itextsharp extract text pdf not working

Answers (1)

Related Questions