Reputation:
I have PDF documents from a scanner. This PDF contain forms filled out and signed by staff for a days work. I want to place a bar code or standard area for OCR text on every form type so the batch scan can be programatically broken apart into separate PDF document based on form type.
I would like to do this in Microsoft .net 2.0
I can purchase the require Adobe or other namespaces/dll need to accomplish the task if there are no open source namespaces/dll's available.
Upvotes: 2
Views: 3035
Reputation: 7889
check out the Tesseract .NET wrapper (v 2.04.0) around the c++ ocr engine by the same name developed by hp in the late 90's, it won awards for its ingenuity
Upvotes: 0
Reputation: 42307
From the title of your question I'm assuming that you just need to break apart PDF files and that they are already OCR'd. There are a few open source .NET PDF libraries out there. I have successfully used PDFSharp in a project of my own.
Here is a quick snippet that shows how to cull out each page from a PDF document using PDFSharp:
string filePath = @"c:\file.pdf";
using (PdfDocument ipdf = PdfReader.Open(filePath, PdfDocumentOpenMode.ReadOnly))
{
int i = 1;
foreach (PdfPage page in ipdf.Pages)
{
using (PdfDocument opdf = new PdfDocument())
{
opdf.Version = ipdf.Version;
opdf.AddPage(page);
opdf.Save("page " + i++ + ".pdf");
}
}
}
Assuming also that you need to access the text in the document for grouping you can use the PdfPage.Contents property.
Upvotes: 1
Reputation: 19479
iTextSharp will help you split, reassemble, and apply barcodes to pdf's in .NET languages. I dont think it can OCR a document, but I havent looked (I used Abby fine Reader engine).
Upvotes: 1
Reputation: 2786
You can research the iTextSharp library, which can split pdf files. But it isn't very good for reading the actual pdfs. So I have no idea how it would know where to split them.
There are companies that already do this for you. You can research the kwiktag company.
Upvotes: 1
Reputation: 48147
Not a free or open source option, but you might also look at ABCPdf by webSuperGoo as another alternative to Adobe.
Upvotes: 2