D.King
D.King

Reputation: 23

Determine if PDF searchable

how to Determine/validate programmatically if the PDF is searchable or not, scanned pdf or not. I know some questions are the same but some not answered properly

  if (openPdfFileDialog.ShowDialog() == System.Windows.Forms.DialogResult.OK)
        {
            string strfilename = openPdfFileDialog.FileName;
            pdfImageBox.Enabled = true;
            btnSave.Enabled = true;
            txt_Save.Enabled = true;
            btnAdd.Enabled = true;
            txtOcr1.Enabled = true;
            this.OpenPDF(openPdfFileDialog.FileName);
            ext.Text = strfilename;
            txt_Save.Text = ext.Text;

        }

Upvotes: 2

Views: 4232

Answers (1)

Karthikeyan
Karthikeyan

Reputation: 136

If the PDF document contains only scanned images, then it won’t have any text in it. We can extract the text from the PDF document and check if it returns empty string then we can conclude that it is a scanned PDF.
https://help.syncfusion.com/file-formats/pdf/working-with-text-extraction

Assuming if your searchable PDF do not have images in it, than you can do image extraction. If images are present then the PDF document have scanned images.
https://help.syncfusion.com/file-formats/pdf/working-with-image-extraction

Note: I work for Syncfusion.

Upvotes: 2

Related Questions