Reputation: 23
how to Determine/validate programmatically if the PDF is searchable or not, scanned pdf or not. I know some questions are the same but some not answered properly
if (openPdfFileDialog.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
string strfilename = openPdfFileDialog.FileName;
pdfImageBox.Enabled = true;
btnSave.Enabled = true;
txt_Save.Enabled = true;
btnAdd.Enabled = true;
txtOcr1.Enabled = true;
this.OpenPDF(openPdfFileDialog.FileName);
ext.Text = strfilename;
txt_Save.Text = ext.Text;
}
Upvotes: 2
Views: 4232
Reputation: 136
If the PDF document contains only scanned images, then it won’t have any text in it. We can extract the text from the PDF document and check if it returns empty string then we can conclude that it is a scanned PDF.
https://help.syncfusion.com/file-formats/pdf/working-with-text-extraction
Assuming if your searchable PDF do not have images in it, than you can do image extraction. If images are present then the PDF document have scanned images.
https://help.syncfusion.com/file-formats/pdf/working-with-image-extraction
Note: I work for Syncfusion.
Upvotes: 2