Pallavipradeep
Pallavipradeep

Reputation: 101

How to merge two word documents which are saved with .docx to a third file?

I am trying to merge two documents lets say Document 1: Merger1.doc Document 2: Merger2.doc

I would like to store it into a new file doc2.docx.

I have used this piece of code to do this, but it is throwing some error.

CODE:

import java.io.*;
import org.apache.poi.hwpf.HWPFDocument; 
import org.apache.poi.hwpf.usermodel.CharacterRun;
import org.apache.poi.hwpf.usermodel.Range;

public class MergerFiles {

public static void main (String[] args) throws Exception {  
    // POI apparently can't create a document from scratch,  
    // so we need an existing empty dummy document  
    HWPFDocument doc = new HWPFDocument(new FileInputStream("C:\\Users\\pallavi123\\Desktop\\Merger1.docx"));  
    Range range = doc.getRange();  


    //I can get the entire Document and insert it in the tmp.doc  
    //However any formatting in my word document is lost.  
    HWPFDocument doc2 = new HWPFDocument(new FileInputStream("C:\\Users\\pallavi123\\Desktop\\Merger2.docx"));  
    Range range2 = doc2.getRange();  
    range.insertAfter(range2.text());  

    //I can get the information (text only) for each character run/paragraph or section.  
    //Again any formatting in my word document is lost.  
    HWPFDocument doc3 = new HWPFDocument(new FileInputStream("D:\\doc2.docx"));  
    Range range3 = doc3.getRange();  
    for(int i=0;i<range3.numCharacterRuns();i++){  
        CharacterRun run3 = range3.getCharacterRun(i);  
        range.insertAfter(run3.text());  
    }  

    OutputStream out = new FileOutputStream("D:\\result.doc");  
    doc.write(out);  
    out.flush();  
    out.close();  
}  
}  

ERROR CODE:

Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:108)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151)
at org.apache.poi.hwpf.HWPFDocument.verifyAndBuildPOIFS(HWPFDocument.java:120)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:133)
at MergerFiles.main(MergerFiles.java:11)

Am i missing any jar file or the way am using code is wrong. Need your valuable suggestions.

Thanks in Advance.

Upvotes: 2

Views: 15487

Answers (4)

victorpacheco3107
victorpacheco3107

Reputation: 862

I've developed the next class:

import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody;

public class WordMerge {

    private final OutputStream result;
    private final List<InputStream> inputs;
    private XWPFDocument first;

    public WordMerge(OutputStream result) {
        this.result = result;
        inputs = new ArrayList<>();
    }

    public void add(InputStream stream) throws Exception{            
        inputs.add(stream);
        OPCPackage srcPackage = OPCPackage.open(stream);
        XWPFDocument src1Document = new XWPFDocument(srcPackage);         
        if(inputs.size() == 1){
            first = src1Document;
        } else {            
            CTBody srcBody = src1Document.getDocument().getBody();
            first.getDocument().addNewBody().set(srcBody);            
        }        
    }

    public void doMerge() throws Exception{
        first.write(result);                
    }

    public void close() throws Exception{
        result.flush();
        result.close();
        for (InputStream input : inputs) {
            input.close();
        }
    }   
}

And its use:

public static void main(String[] args) throws Exception {

    FileOutputStream faos = new FileOutputStream("/home/victor/result.docx");

    WordMerge wm = new WordMerge(faos);

    wm.add( new FileInputStream("/home/victor/001.docx") );
    wm.add( new FileInputStream("/home/victor/002.docx") );

    wm.doMerge();
    wm.close();

}

Upvotes: 6

Benedetto Moro
Benedetto Moro

Reputation: 76

I have a suggestion! First the main method; the parameters are: test1=firstDocxFileName, test2=secondDocxFileName, dest=destinationFileName; document is a global variable;

    public void mergeDocx(String test1, String test2, String dest){


    try {
        XWPFDocument doc1 = new XWPFDocument(new FileInputStream(new File(test1)));
        XWPFDocument doc2 = new XWPFDocument(new FileInputStream(new File(test2)));
        document = new XWPFDocument();
        passaElementi(doc1);
        passaElementi(doc2);
        passaStili(doc1,doc2);
        OutputStream out = new FileOutputStream(new File(dest));
        document.write(out);
        out.close();

    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

}

The private method 'passaElementi'copies and paste the body elements from doc1 to document object;I don't know what is XWPFSDT object...; (pay attention: i don't copy all the document but only the body!! .. for headers, sections, footers it proceed similarly) (the integer variables i and j are global and 0 at the beginning obviously)

private void passaElementi(XWPFDocument doc1){

    for(IBodyElement e : doc1.getBodyElements()){
        if(e instanceof XWPFParagraph){
            XWPFParagraph p = (XWPFParagraph) e;
            if(p.getCTP().getPPr()!=null && p.getCTP().getPPr().getSectPr()!=null){
                continue;
            }else{
                document.createParagraph();
                document.setParagraph(p, i);
                i++;
            }
        }else if(e instanceof XWPFTable){
            XWPFTable t = (XWPFTable)e;
            document.createTable();
            document.setTable(j, t);
            j++;
        }else if(e instanceof XWPFSDT){
            // boh!
        }

    }

}

The private method 'passaStili' copies and paste styles from doc1 and doc2 to document object;

private void passaStili(XWPFDocument doc1, XWPFDocument doc2){
    try {
        CTStyles c1 = doc1.getStyle();
        CTStyles c2 =  doc2.getStyle();
        int size1 = c1.getStyleList().size();
        int size2 = c2.getStyleList().size();
        for(int i = 0; i<size2; i++ ){
            c1.addNewStyle();
            c1.setStyleArray(size1+i, c2.getStyleList().get(i));
        }


        document.createStyles().setStyles(c1);
    } catch (XmlException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

I don't handle exceptions to be fast! Leave a like if you liked it! Best regards!

B.M.

Upvotes: 2

Pooria Alimardani
Pooria Alimardani

Reputation: 39

when you use HWPFDocument,should use doc file (not docx)

Upvotes: 0

Jean Logeart
Jean Logeart

Reputation: 53809

You should use XWPFDocument instead of HWPFDocument.

The documentation states:

The partner to HWPF for the new Word 2007 .docx format is XWPF. Whilst HWPF and XWPF provide similar features, there is not a common interface across the two of them at this time.

Change your code to:

XWPFDocument doc = new XWPFDocument(new FileInputStream("..."));
XWPFDocument doc2 = new XWPFDocument(new FileInputStream("...")); 
XWPFDocument doc3 = new XWPFDocument(new FileInputStream("..."));

Upvotes: 1

Related Questions