Cedric
Cedric

Reputation: 21

splitting a pdf in different pdf pages

I am trying to use the following library import com.itextpdf in order to |: from one pdf document create a new pdf document per page.

For example for a.pdf which is 3 pages I am creating a1.pdf a2.pdf and a3.pdf which a1 being the first page of a etc...

For some reasons the output created is incorrect. If a.pdf is one page the new page created as a different hash... any help is appreciated

public static void onePage(int num, String to, PdfReader reader) throws DocumentException,IOException {
    Document document = new Document(PageSize.A4);

    PdfWriter writer = PdfWriter.getInstance(document,new FileOutputStream(to));
    document.open();

    PdfImportedPage page;
    page = writer.getImportedPage(reader, num);
    Image instance = Image.getInstance(page);

    instance.setAbsolutePosition(0, 30);

    document.add(instance);

    document.close();

}
public static void makePages(String name) throws IOException, DocumentException{

    PdfReader reader = new PdfReader(name+".pdf");
    int n = reader.getNumberOfPages();
    for(int i=1; i<=n;i++){
        onePage(i,  name+i+".pdf", reader);
    }
}

Upvotes: 0

Views: 5488

Answers (3)

UdayKiran Pulipati
UdayKiran Pulipati

Reputation: 6667

Converting PDF pages 04-Request-Headers.pdf into individual pdf pages using PDFBox.

Download latest PDFBox jars from Apache PDFBox latest releases,

Solution for Apache PDFBox 1.8.* version: Supported jars for executing below Java program pdfbox-1.8.3.jar and commons-logging-1.1.3.jar

import java.io.File;
import java.util.List;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
/**
 * 
 * @version 1.8.3
 *
 * @author udaykiran.pulipati
 *
 */

@SuppressWarnings("unchecked")
public class ExtractPagesFromPdfAndSaveAsNewPDFPage {
    public static void main(String[] args) {
        try {
            String sourceDir = "C:/PDFCopy/04-Request-Headers.pdf";
            String destinationDir = "C:/PDFCopy/";
            File oldFile = new File(sourceDir);
            String fileName = oldFile.getName().replace(".pdf", "");
            if (oldFile.exists()) {
                File newFile = new File(destinationDir);
                if (!newFile.exists()) {
                    newFile.mkdir();
            }

            PDDocument document = PDDocument.load(sourceDir);
            List<PDPage> list = document.getDocumentCatalog().getAllPages();

            int pageNumber = 1;
            for (PDPage page : list) {
                PDDocument newDocument = new PDDocument();
                newDocument.addPage(page);

                newFile = new File(destinationDir + fileName + "_"+ pageNumber +".pdf");
                newFile.createNewFile();

                newDocument.save(newFile);
                newDocument.close();
                pageNumber++;
            }
        } else {
            System.err.println(fileName +" File not exists");
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

Solution for Apache PDFBox 2.0.* version:

Required Jars pdfbox-2.0.16.jar, fontbox-2.0.16.jar, commons-logging-1.2.jar or required pom.xml dependencies

<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/fontbox -->
<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>fontbox</artifactId>
    <version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-logging/commons-logging -->
<dependency>
    <groupId>commons-logging</groupId>
    <artifactId>commons-logging</artifactId>
    <version>1.2</version>
</dependency>

Solution:

package com.java.pdf.pdfbox.examples;

import java.io.File;
import java.util.Iterator;
import java.util.List;

import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;

/**
 * 
 * @version 2.0.16
 * 
 * @author udaykiran.pulipati
 * 
 */

public class ExtractPDFPagesAndSaveAsNewPDFPage {
    public static void main(String[] args) {
        try {
            String sourceDir = "C:\\Users\\udaykiranp\\Downloads\\04-Request-Headers.pdf";
            String destinationDir = "C:\\Users\\udaykiranp\\Downloads\\PDFCopy\\";
            File oldFile = new File(sourceDir);
            String fileName = oldFile.getName().replace(".pdf", "");
            if (oldFile.exists()) {
                File newFile = new File(destinationDir);
                if (!newFile.exists()) {
                    newFile.mkdir();
                }

            PDDocument document = PDDocument.load(oldFile);

            int totalPages = document.getNumberOfPages();
            System.out.println("Total Pages: "+ totalPages);
            if(totalPages > 0) {
                Splitter splitter = new Splitter();

                List<PDDocument> Pages = splitter.split(document);
                Iterator<PDDocument> iterator = Pages.listIterator();

                //Saving each page as an individual document
                int i = 1;
                while(iterator.hasNext()) {
                    PDDocument pd = iterator.next();
                    String pagePath = destinationDir + fileName + "_" + i + ".pdf";
                    pd.save(pagePath);
                    System.out.println("Page "+ i +", Extracted to : "+ pagePath);
                    i++;
                }
            } else {
                System.err.println("Blank / Empty PDF file: "+ fileName  +", Contains "+ totalPages +" pages.");
            }
        } else {
            System.err.println(fileName + " File not exists");
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

Upvotes: 2

Michael
Michael

Reputation: 2500

The hash of the two PDFs is most likely only different because PDF documents contain a lot of additional metadata that is probably not being copied over identically when you copy the single page to a new PDF. This could be as insignificant as information about what the PDF was generated with and when. The easiest thing would be to simply not split the PDF at all if there is only one page.

Upvotes: 1

Senthil
Senthil

Reputation: 5804

you could check for no of pages and if one page only there, you don't need to create new PDF. is it? that would be simple fix for the problem

Upvotes: 0

Related Questions