Hetote
Hetote

Reputation: 340

Removing PDF invisible objects with iTextSharp

Is possible to use iTextSharp to remove from a PDF document objects that are not visible (or at least not being displayed)?

More details:

1) My source is a PDF page containing images and text (maybe some vectorial drawings) and embedded fonts.

2) There's an interface to design multiple 'crop boxes'.

3) I must generate a new PDF that contains only what is inside the crop boxes. Anything else must be removed from resulting document (indeed I may accept content which is half inside and half outside, but this is not the ideal and it should not appear anyway).

My solution so far:

I have successfully developed a solution that creates new temporary documents, each one containing the content of each crop box (using writer.GetImportedPage and contentByte.AddTemplate to a page that is exactly the size of the crop box). Then I create the final document and repeat the process, using the AddTemplate method do position each "cropped page" in the final page.

This solution has 2 big disadvantages:

So, I think I need to iterate through PDF objects, detect if it is visible or not, and delete it. At the time of writing, I am trying to use pdfReader.GetPdfObject.

Thanks for the help.

Upvotes: 31

Views: 6642

Answers (5)

Praveena M
Praveena M

Reputation: 522

If the PDF which you are trying is a template/predefined/fixed then you can remove that object by calling RemoveField.

PdfReader pdfReader = new PdfReader("../Template_Path.pdf");
PdfStamper pdfStamperToPopulate = new PdfStamper(pdfReader, new FileStream(outputPath, FileMode.Create));
AcroFields pdfFormFields = pdfStamperToPopulate.AcroFields;
pdfFormFields.RemoveField("fieldNameToBeRemoved");

Upvotes: 1

Max
Max

Reputation: 1820

Here is three solutions I found, if it can help someone (using iTextSharp, Amyuni or Tracker-Software, as @Hetote said in the comments he was looking for another library):

Using iTextSharp

As answered by @martinbuberl in another question:

public static void CropDocument(string file, string oldchar, string repChar)
{
    int pageNumber = 1;
    PdfReader reader = new PdfReader(file);
    iTextSharp.text.Rectangle size = new iTextSharp.text.Rectangle(
    Globals.fX,
    Globals.fY,
    Globals.fWidth,
    Globals.fHeight);
    Document document = new Document(size);
    PdfWriter writer = PdfWriter.GetInstance(document,
    new FileStream(file.Replace(oldchar, repChar),
    FileMode.Create, FileAccess.Write));
    document.Open();
    PdfContentByte cb = writer.DirectContent;
    document.NewPage();
    PdfImportedPage page = writer.GetImportedPage(reader,
    pageNumber);
    cb.AddTemplate(page, 0, 0);
    document.Close();
}

Another answer by @rafixwpt in his question, but it doesn't remove the invisible elements, it cleans an area of the page, which can affect other parts of the page:

static void textsharpie()
{
    string file = "C:\\testpdf.pdf";
    string oldchar = "testpdf.pdf";
    string repChar = "test.pdf";
    PdfReader reader = new PdfReader(file);
    PdfStamper stamper = new PdfStamper(reader, new FileStream(file.Replace(oldchar, repChar), FileMode.Create, FileAccess.Write));
    List<PdfCleanUpLocation> cleanUpLocations = new List<PdfCleanUpLocation>();
    cleanUpLocations.Add(new PdfCleanUpLocation(1, new iTextSharp.text.Rectangle(0f, 0f, 600f, 115f), iTextSharp.text.BaseColor.WHITE));
    PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
    cleaner.CleanUp();
    stamper.Close();
    reader.Close();
}

Using Amyuni

As answered by @yms in another question:

IacDocument.GetObjectsInRectangle Method

The GetObjectsInRectangle method gets all the objects that are in the specified rectangle.

Then you can iterate all the objects in the page and delete those that you are not interested in:

//open a pdf document
document.Open(testfile, "");
IacPage page1 = document.GetPage(1);
Amyuni.PDFCreator.IacAttribute attribute = page1.AttributeByName("Objects");

// listObj is an array list of graphic objects
System.Collections.ArrayList listobj = (System.Collections.ArrayList) attribute.Value.Cast<IacObject>();;

// listObjToKeep is an array list of graphic objects inside a rectangle
var listObjToKeep = document.GetObjectsInRectangle(0f, 0f, 600f, 115f,  IacGetRectObjectsConstants.acGetRectObjectsIntersecting).Cast<IacObject>();
foreach (IacObject pdfObj in listObj.Except(listObjToKeep))
{
   // if pdfObj is not in visible inside the rectangle then call pdfObj.Delete();
   pdfObj.Delete(false);
}

As said by @yms in the comments, another solution using the new method IacDocument.Redact in version 5.0 can also be used to delete all the objects in the specified rectangle and draw a solid color rectangle at their place.

Using Tracker-Software Editor SDK

I didn't try it but it seems possible, see this post.

Upvotes: 1

HABJAN
HABJAN

Reputation: 9328

Yes, it's possible. You need to parse pdf page content bytes to PdfObjects, store them to the memory, delete unvanted PdfObject's, build Pdf content from PdfObject's back to pdf content bytes, replace page content in PdfReader just before you import the page via PdfWriter.

I would recommend you to check out this: http://habjan.blogspot.com/2013/09/proof-of-concept-converting-pdf-files.html

Sample from the link implements Pdf content bytes parsing, building back from PdfObjec's, replacing PdfReader page content bytes...

Upvotes: 1

Amin AmiriDarban
Amin AmiriDarban

Reputation: 2068

PdfReader pdfReader = new PdfReader(../Template_Path.pdf"));
PdfStamper pdfStamperToPopulate = new PdfStamper(pdfReader, new FileStream(outputPath, FileMode.Create));
AcroFields pdfFormFields = pdfStamperToPopulate.AcroFields;
pdfFormFields.RemoveField("fieldNameToBeRemoved");

Upvotes: 1

B2K
B2K

Reputation: 2611

Have you tried using an IRenderListener? You can selectively add only those elements to the new pdf which fall within the crop regions by examining the StartPoint and EndPoint or Area of the TextRenderInfo or ImageRenderInfo objects.

Upvotes: 0

Related Questions