Reputation: 101
I am trying to merge two documents lets say Document 1: Merger1.doc Document 2: Merger2.doc
I would like to store it into a new file doc2.docx.
I have used this piece of code to do this, but it is throwing some error.
CODE:
import java.io.*;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.CharacterRun;
import org.apache.poi.hwpf.usermodel.Range;
public class MergerFiles {
public static void main (String[] args) throws Exception {
// POI apparently can't create a document from scratch,
// so we need an existing empty dummy document
HWPFDocument doc = new HWPFDocument(new FileInputStream("C:\\Users\\pallavi123\\Desktop\\Merger1.docx"));
Range range = doc.getRange();
//I can get the entire Document and insert it in the tmp.doc
//However any formatting in my word document is lost.
HWPFDocument doc2 = new HWPFDocument(new FileInputStream("C:\\Users\\pallavi123\\Desktop\\Merger2.docx"));
Range range2 = doc2.getRange();
range.insertAfter(range2.text());
//I can get the information (text only) for each character run/paragraph or section.
//Again any formatting in my word document is lost.
HWPFDocument doc3 = new HWPFDocument(new FileInputStream("D:\\doc2.docx"));
Range range3 = doc3.getRange();
for(int i=0;i<range3.numCharacterRuns();i++){
CharacterRun run3 = range3.getCharacterRun(i);
range.insertAfter(run3.text());
}
OutputStream out = new FileOutputStream("D:\\result.doc");
doc.write(out);
out.flush();
out.close();
}
}
ERROR CODE:
Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:108)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151)
at org.apache.poi.hwpf.HWPFDocument.verifyAndBuildPOIFS(HWPFDocument.java:120)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:133)
at MergerFiles.main(MergerFiles.java:11)
Am i missing any jar file or the way am using code is wrong. Need your valuable suggestions.
Thanks in Advance.
Upvotes: 2
Views: 15487
Reputation: 862
I've developed the next class:
import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody;
public class WordMerge {
private final OutputStream result;
private final List<InputStream> inputs;
private XWPFDocument first;
public WordMerge(OutputStream result) {
this.result = result;
inputs = new ArrayList<>();
}
public void add(InputStream stream) throws Exception{
inputs.add(stream);
OPCPackage srcPackage = OPCPackage.open(stream);
XWPFDocument src1Document = new XWPFDocument(srcPackage);
if(inputs.size() == 1){
first = src1Document;
} else {
CTBody srcBody = src1Document.getDocument().getBody();
first.getDocument().addNewBody().set(srcBody);
}
}
public void doMerge() throws Exception{
first.write(result);
}
public void close() throws Exception{
result.flush();
result.close();
for (InputStream input : inputs) {
input.close();
}
}
}
And its use:
public static void main(String[] args) throws Exception {
FileOutputStream faos = new FileOutputStream("/home/victor/result.docx");
WordMerge wm = new WordMerge(faos);
wm.add( new FileInputStream("/home/victor/001.docx") );
wm.add( new FileInputStream("/home/victor/002.docx") );
wm.doMerge();
wm.close();
}
Upvotes: 6
Reputation: 76
I have a suggestion! First the main method; the parameters are: test1=firstDocxFileName, test2=secondDocxFileName, dest=destinationFileName; document is a global variable;
public void mergeDocx(String test1, String test2, String dest){
try {
XWPFDocument doc1 = new XWPFDocument(new FileInputStream(new File(test1)));
XWPFDocument doc2 = new XWPFDocument(new FileInputStream(new File(test2)));
document = new XWPFDocument();
passaElementi(doc1);
passaElementi(doc2);
passaStili(doc1,doc2);
OutputStream out = new FileOutputStream(new File(dest));
document.write(out);
out.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
The private method 'passaElementi'copies and paste the body elements from doc1 to document object;I don't know what is XWPFSDT object...; (pay attention: i don't copy all the document but only the body!! .. for headers, sections, footers it proceed similarly) (the integer variables i and j are global and 0 at the beginning obviously)
private void passaElementi(XWPFDocument doc1){
for(IBodyElement e : doc1.getBodyElements()){
if(e instanceof XWPFParagraph){
XWPFParagraph p = (XWPFParagraph) e;
if(p.getCTP().getPPr()!=null && p.getCTP().getPPr().getSectPr()!=null){
continue;
}else{
document.createParagraph();
document.setParagraph(p, i);
i++;
}
}else if(e instanceof XWPFTable){
XWPFTable t = (XWPFTable)e;
document.createTable();
document.setTable(j, t);
j++;
}else if(e instanceof XWPFSDT){
// boh!
}
}
}
The private method 'passaStili' copies and paste styles from doc1 and doc2 to document object;
private void passaStili(XWPFDocument doc1, XWPFDocument doc2){
try {
CTStyles c1 = doc1.getStyle();
CTStyles c2 = doc2.getStyle();
int size1 = c1.getStyleList().size();
int size2 = c2.getStyleList().size();
for(int i = 0; i<size2; i++ ){
c1.addNewStyle();
c1.setStyleArray(size1+i, c2.getStyleList().get(i));
}
document.createStyles().setStyles(c1);
} catch (XmlException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
I don't handle exceptions to be fast! Leave a like if you liked it! Best regards!
B.M.
Upvotes: 2
Reputation: 53809
You should use XWPFDocument
instead of HWPFDocument
.
The documentation states:
The partner to HWPF for the new Word 2007 .docx format is XWPF. Whilst HWPF and XWPF provide similar features, there is not a common interface across the two of them at this time.
Change your code to:
XWPFDocument doc = new XWPFDocument(new FileInputStream("..."));
XWPFDocument doc2 = new XWPFDocument(new FileInputStream("..."));
XWPFDocument doc3 = new XWPFDocument(new FileInputStream("..."));
Upvotes: 1