Ahmad
Ahmad

Reputation: 59

Extract PDF Annotation

I need to Extract and Read only the annotation of PDF using C#.

I can extract the file without any problem by using both PDFBox and itextsharp but I need to read the annotation text or underlined or coloured (highlighted lines).

Any idea?

Upvotes: 1

Views: 1382

Answers (1)

Bruno Lowagie
Bruno Lowagie

Reputation: 77606

You need to understand that there is a difference between the actual content of a page (the content that is described using PDF syntax in the content stream of a page) and the annotations that are added to a page (the content that is described in the annotation dictionaries in the /Annots entry of the page dictionary).

So far, you are extracting the content of the annotation dictionaries, but you also want to extract the content from the content stream of which the location is identified using the /Rect entry of the annotation. You need to parse the content stream of the page to do that.

Please go to the official iText web site and read the FAQ, more specifically: How to read text from a specific position?

Suppose that reader is your PdfReader instance, rect is the Rectangle defining the location of the text you want to extract, and page the corresponding page number, then you can create a RenderFilter and use the LocationTextExtractionStrategy like this:

RenderFilter[] filter = {new RegionTextRenderFilter(rect)};
ITextExtractionStrategy strategy =
    new FilteredTextRenderListener(
        new LocationTextExtractionStrategy(), filter);
String text = PdfTextExtractor.GetTextFromPage(reader, page, strategy));

Upvotes: 1

Related Questions