sergio trajano
sergio trajano

Reputation: 311

Extract data from Foxit Reader textbox comments using iText

Consider that you have a pdf document with this no-OCR text:

"I am Sam, I am 28 years old and tomorrow is april/18/2018."

Is it possible to insert form controls right below "Sam", "28" and "april/18/2018", so that the user can type into the form controls exactly those informations, and them those informations from the form controls can be read by the programming code?

Could iTextSharp do that? Or maybe a simpler tool?

EDIT 1: Below I will try to make my goal more clear (sorry my english).

In my job I have to extract a lot of information from old scanned documents with no-OCR. Apply OCR on them is not a option. What I would like to do is: (a) I open the pdf document and start reading it. (b) Everytime I found a information that I will have to use, like a data birth, I would like to insert/apply/put/create/set a text box near it (below for example), and then type that date birth inside the text box. (c) After finishing the task of insert all textboxes I want (names, ages, date births, incomes, etc), and type in them the information read from the pdf, I would like to be able to treat all that information inside the textboxes inside my programming code, in C# for example.

Text-boxes inserted and typed-in by the user. Here I used the text-box feature of Foxit Reader, that is like what I want.

Upvotes: 0

Views: 417

Answers (2)

mkl
mkl

Reputation: 95928

Using iText 7 you can extract the textbox comments like this:

try (   PdfReader pdfReader = new PdfReader("HelloFOXIT.pdf");
        PdfDocument pdfDocument = new PdfDocument(pdfReader)   ) {
    for (int pageNr = 1; pageNr <= pdfDocument.getNumberOfPages(); pageNr++) {
        System.out.printf("\n\nPage %d\n\n", pageNr);
        PdfPage page = pdfDocument.getPage(pageNr);
        for (PdfAnnotation pdfAnnotation : page.getAnnotations()) {
            System.out.printf("- %s\n", pdfAnnotation.getContents());
        }
    }
}

The output:

Page 1

- 28
- 18/04/2018
- SAM

Upvotes: 1

sergio trajano
sergio trajano

Reputation: 311

Using iText, it is possible to extract the comments inserted in a Foxit Reader pdf Callout feature. As answered by mkl in the comments of the question, those Foxit Reader comments are "contained in the Contents entries of the annotation dictionaries."

Upvotes: 0

Related Questions