Karthik
Karthik

Reputation: 91

Highlight words in a pdf using itextsharp, not displaying highlighted word in browser

Highlighted words are not displaying in browser using itextsharp.

Adobe

enter image description here

Browser

enter image description here

CODE

 List<iTextSharp.text.Rectangle> MatchesFound = strategy.GetTextLocations(splitText[i].Trim(), StringComparison.CurrentCultureIgnoreCase);
                    foreach (Rectangle rect in MatchesFound)
                    {
                        float[] quad = { rect.Left - 3.0f, rect.Bottom, rect.Right, rect.Bottom, rect.Left - 3.0f, rect.Top + 1.0f, rect.Right, rect.Top + 1.0f };
                        //Create our hightlight
                        PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);
                        //Set the color
                        highlight.Color = BaseColor.YELLOW;
                       
                        //Add the annotation
                        stamper.AddAnnotation(highlight, pageno);
                        
                    }

Kindly help me to solve this issue.

Updaetd Code

  private void highlightPDF()
{
    //Create a simple test file
    string outputFile = Server.MapPath("~/pdf/16193037V_Dhana-FI_NK-QA_Completed.pdf");
    string filename = "HL" + Convert.ToString(Session["Filename"]) + ".pdf";
    Session["Filename"] = "HL" + Convert.ToString(Session["Filename"]);
    //Create a new file from our test file with highlighting
    string highLightFile = Server.MapPath("~/pdf/" + filename);

    //Bind a reader and stamper to our test PDF

    PdfReader reader = new PdfReader(outputFile);
    iTextSharp.text.pdf.PdfContentByte canvas;
    int pageno = Convert.ToInt16(txtPageno.Text);
    using (FileStream fs = new FileStream(highLightFile, FileMode.Create, FileAccess.Write, FileShare.None))
    {
        using (PdfStamper stamper = new PdfStamper(reader, fs))
        {
            canvas = stamper.GetUnderContent(pageno);
            myLocationTextExtractionStrategy strategy = new myLocationTextExtractionStrategy();
            strategy.UndercontentCharacterSpacing = canvas.CharacterSpacing;
            strategy.UndercontentHorizontalScaling = canvas.HorizontalScaling;

            string currentText = PdfTextExtractor.GetTextFromPage(reader, pageno, strategy);
            string text = txtHighlight.Text.Replace("\r\n", "").Replace("\\n", "\n").Replace("  ", " ");
            string[] splitText = text.Split(new string[] { "\n" }, StringSplitOptions.RemoveEmptyEntries);
            for (int i = 0; i < splitText.Length; i++)
            {
                List<iTextSharp.text.Rectangle> MatchesFound = strategy.GetTextLocations(splitText[i].Trim(), StringComparison.CurrentCultureIgnoreCase);
                foreach (Rectangle rect in MatchesFound)
                {
                    canvas.SaveState();
                    canvas.SetColorFill(BaseColor.YELLOW);
                    canvas.Rectangle(rect);
                    canvas.Fill();
                    canvas.RestoreState();                      
                }
            }

        }
    }
    reader.Close();      


}

It's not highlighting the text. I passed the text and page no to highlight the text.

Upvotes: 1

Views: 7658

Answers (2)

mkl
mkl

Reputation: 95888

First of all...

Why does the OP's (updated) code not work

There actually are two factors.

First of all, there is an issue in the OP's code, to add a rectangle to a path he uses

canvas.Rectangle(rect);

Unfortunately this does not what he expects: The Rectangle class has multiple properties beyond the mere coordinates of a rectangle, foremost information about selected borders, border colors, and an interior color, and PdfContentByte.Rectangle(Rectangle) draws a rectangle according to those properties.

In the case at hand, though, rect is used only to transport the coordinates of a rectangle, so those additional properties all are false or null. Thus, canvas.Rectangle(rect) does nothing!

Instead the OP should use

canvas.Rectangle(rect.Left, rect.Bottom, rect.Width, rect.Height);

here.

Furthermore, @Bruno mentioned in his answer

Note that you won't see the yellow rectangle if you add it under an opaque shape (e.g. under an image).

Unfortunately exactly this is the case here: The document actually is a scanned document, each page been a page-filling image under which the equivalent text is drawn (probably after OCR'ing) to allow textual copy&paste.

Thus, whatever the OP's code may draw on the UnderContent, it will be hidden by that very image.

Thus, let's try something different...

How to make it work

@Bruno in his answer also indicated a solution for such a case:

In that case, you could add a transparent rectangle on top of the existing content.

Following this advice we replace

canvas = stamper.GetUnderContent(pageno);

by

canvas = stamper.GetOverContent(pageno);

PdfGState state = new PdfGState();
state.FillOpacity = .3f;
canvas.SetGState(state);

Selecting the word "support" on the third document page we get:

using an opacity of .3

The yellow is quite pale here.

Using an Opacity value of .6 instead we get

using an opacity of .6

Now the yellow is more intense but the text starts to pale out.

For tasks like this I actually prefer using the blend mode Darken. This can be done by using

state.BlendMode = new PdfName("Darken");

instead of state.FillOpacity = .3f. This results in

using the blend mode Darken

This IMO looks better.

How the client did it

The OP commented

Client have given a pdf. In that, they highlighted text, the highlighted text is displayed in browser

The client's PDF actually uses annotations, just like the OP in his original code, but in contrast each of the client's annotations contains an appearance stream which the highlight annotations generated by iText don't.

Supplying an appearance is optional and PDF viewers indeed should generate an appearance if none is given. Obviously, though, there are numerous PDF viewers which rely on appearances the PDF brings along.

By the way, the appearances in the client's PDF actually use the blend mode Multiply. For underlying white and black colors, Darken and Multiply have the same result.

Making it work with annotations

In a comment the OP wondered

Please one more doubt, if the user wrongly highlighted then how to remove yellow color(or change yellow to white)? i changed yellow to white but it's not working. canvas.SetColorFill(BaseColor.WHITE);

Undoing a change to the page content generally is more difficult than undoing the addition of an annotation. Thus, let's make the OP's original code also work, i.e. adding an appearance stream to the highlight annotations.

As the OP reported in another comment, his first attempt to add an appearance stream failed:

PdfAppearance appearance = PdfAppearance.CreateAppearance(stamper.Writer, rect.Width, rect.Height);
appearance.Rectangle(rect.Left, rect.Bottom, rect.Width, rect.Height);
appearance.SetColorFill(BaseColor.WHITE);
appearance.Fill();
highlight.SetAppearance( PdfAnnotation.APPEARANCE_NORMAL, appearance );
stamper.AddAnnotation(highlight, pageno);

but it's not working.

The problems in his attempt are:

  • The origin of the appearance template is in the lower left corner of the annotation area, not of the page. To color the area in question, therefore, the rectangle must have its lower left at (0, 0).
  • Strictly speaking the color must be set before starting the path building.
  • A different color than white should be used for highlighting.
  • Transparency or an appropriate rendering mode should be used to allow the original, marked text to shine through.

Thus, the following code shows how to do it.

private void highlightPDFAnnotation(string outputFile, string highLightFile, int pageno, string[] splitText)
{
    PdfReader reader = new PdfReader(outputFile);
    iTextSharp.text.pdf.PdfContentByte canvas;
    using (FileStream fs = new FileStream(highLightFile, FileMode.Create, FileAccess.Write, FileShare.None))
    {
        using (PdfStamper stamper = new PdfStamper(reader, fs))
        {
            myLocationTextExtractionStrategy strategy = new myLocationTextExtractionStrategy();
            strategy.UndercontentHorizontalScaling = 100;

            string currentText = PdfTextExtractor.GetTextFromPage(reader, pageno, strategy);
            for (int i = 0; i < splitText.Length; i++)
            {
                List<iTextSharp.text.Rectangle> MatchesFound = strategy.GetTextLocations(splitText[i].Trim(), StringComparison.CurrentCultureIgnoreCase);
                foreach (Rectangle rect in MatchesFound)
                {
                    float[] quad = { rect.Left - 3.0f, rect.Bottom, rect.Right, rect.Bottom, rect.Left - 3.0f, rect.Top + 1.0f, rect.Right, rect.Top + 1.0f };
                    //Create our hightlight
                    PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);
                    //Set the color
                    highlight.Color = BaseColor.YELLOW;

                    PdfAppearance appearance = PdfAppearance.CreateAppearance(stamper.Writer, rect.Width, rect.Height);
                    PdfGState state = new PdfGState();
                    state.BlendMode = new PdfName("Multiply");
                    appearance.SetGState(state);
                    appearance.Rectangle(0, 0, rect.Width, rect.Height);
                    appearance.SetColorFill(BaseColor.YELLOW);
                    appearance.Fill();

                    highlight.SetAppearance(PdfAnnotation.APPEARANCE_NORMAL, appearance);

                    //Add the annotation
                    stamper.AddAnnotation(highlight, pageno);
                }
            }
        }
    }
    reader.Close();
}

These annotation are displayed by Chrome, too, and as annotations they can easily be removed.

Upvotes: 7

Bruno Lowagie
Bruno Lowagie

Reputation: 77528

You are using a Markup annotation to highlight text. That's great! There's nothing wrong with your code, nor with iText. However: not all PDF viewers support that functionality.

If you want to see highlighted text in every PDF viewer, a (sub-optimal) workaround could be to add a yellow rectangle to the content stream under the existing content (assuming that the existing content isn't opaque).

This is demonstrated in the HighLightByAddingContent example:

public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
    PdfReader reader = new PdfReader(src);
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
    PdfContentByte canvas = stamper.getUnderContent(1);
    canvas.saveState();
    canvas.setColorFill(BaseColor.YELLOW);
    canvas.rectangle(36, 786, 66, 16);
    canvas.fill();
    canvas.restoreState();
    stamper.close();
    reader.close();
}

In this example, we take a file named hello.pdf and we add a yellow rectangle, with the file hello_highlighted.pdf as result.

Note that you won't see the yellow rectangle if you add it under an opaque shape (e.g. under an image). In that case, you could add a transparent rectangle on top of the existing content.

Update: my example was written in Java. It shouldn't be a problem for a developer to port this to C#. It's only a matter of changing some lower-cases into upper-cases. E.g. stamper.GetUnderContent(1) instead of stamper.getUnderContent(1), canvas.SaveState() instead of canvas.saveState(), and so on.

Upvotes: 3

Related Questions