how to insert text into a scanned pdf document using java

Question

I have to add text to pdf documents where there are many scanned pdf documents so the inserted text is inserted back to the scanned image and not over the image. how to add text over the scanned image inside the pdf.

package editExistingPDF;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

import jxl.Cell;
import jxl.Sheet;
import jxl.Workbook;
import jxl.read.biff.BiffException;

import org.apache.commons.io.FilenameUtils;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Font;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;

public class AddPragraphToPdf {



    public static void main(String[] args) throws IOException, DocumentException, BiffException {

        String tan = "no tan";
        File inputWorkbook = new File("lars.xls");
        Workbook w;

            w = Workbook.getWorkbook(inputWorkbook);
            // Get the first sheet
            Sheet sheet = w.getSheet(0);

            Cell[] tnas =sheet.getColumn(0);


        File ArticleFolder = new File("C:\Documents and Settings\sathishkumarkk\My Documents\article");
        File[] listOfArticles = ArticleFolder.listFiles();

        for (int ArticleInList = 0; ArticleInList < listOfArticles.length; ArticleInList++)  
        { 
            Document document = new Document(PageSize.A4);

      //  System.out.println(listOfArticles[ArticleInList].toString());
        PdfReader pdfArticle = new PdfReader(listOfArticles[ArticleInList].toString());
        if(listOfArticles[ArticleInList].getName().contains(".si."))
        {continue;}
        int noPgs=pdfArticle.getNumberOfPages();
        String ArticleNoWithOutExt = FilenameUtils.removeExtension(listOfArticles[ArticleInList].getName());
        String TanNo=ArticleNoWithOutExt.substring(0,ArticleNoWithOutExt.indexOf('.'));

     // Create output PDF
        PdfWriter writer = PdfWriter.getInstance(document,new FileOutputStream("C:\Documents and Settings\sathishkumarkk\My Documents\toPrint\"+ArticleNoWithOutExt+".pdf"));
        document.open();
        PdfContentByte cb = writer.getDirectContent();
        //get tan form excel sheet
        System.out.println(TanNo);
        for(Cell content : tnas){
            if(content.getContents().contains(TanNo)){
                tan=content.getContents();
                System.out.println(tan);
            }else{
                continue;
            }
        }
        // Load existing PDF
        //PdfReader reader = new PdfReader(new FileInputStream("1.pdf"));

          for (int i = 1; i <= noPgs; i++) {
        PdfImportedPage page = writer.getImportedPage(pdfArticle, i); 

        // Copy first page of existing PDF into output PDF
        document.newPage();
        cb.addTemplate(page, 0, 0);
        // Add your TAN here
        Paragraph p= new Paragraph(tan);
        Font font = new Font();
        font.setSize(1.0f);
        p.setLeading(12.0f, 1.0f);
        p.setFont(font);

        document.add(p); 
          }
        document.close();
        }
    }

}

NOTE: The problem is that when there is a pdf create with only text I have no problem but when a pdf is with full of scanned document and when I try to add text; it gets added to the back of the scanned document. so while I print those pdf I will not get those text I added.

Grooveek · Accepted Answer

From this iText Example (which is the reverse of what you want, but switch getUnderContent with getOverContent and you'll be fine) :

Blockquote Each PDF page has two extra layers; one that sits on top of all text / graphics and one that goes to the bottom. All user added content gets in-between these two. If we get into this bottommost content, we can write anything under that we want. To get into this bottommost layer, we can use the " getUnderContent" method of PdfStamper object.
This is documented in iText API Reference as shown below:

public PdfContentByte getUnderContent(int pageNum)
    Gets a PdfContentByte to write under the page of the original document.
    Parameters:
       pageNum - the page number where the extra content is written
    Returns:
         a PdfContentByte to write under the page of the original document

how to insert text into a scanned pdf document using java

Answers (2)

Related Questions