Reputation:
Does anyone know how to merge (concatenate) docx documents with PHP (or Python if it's not possible in PHP)?
To clarify, my server is Linux based. I have 2 existing docx document, I need to put them in a new docx document using PHP or possibly Python.
Upvotes: 2
Views: 19447
Reputation: 939
Aspose.Words Cloud SDK for PHP can merge/join several Word Documents into a one Word document while keeping the formatting of appended or destination document depending upon the ImportFormatMode parameter value. Secondly, it is a commercial API but the free pricing plan allows 150 free monthly API Calls.
<?php
require_once('D:\xampp\htdocs\aspose-words-cloud-php-master\vendor\autoload.php');
//TODO: Get your ClientId and ClientSecret at https://dashboard.aspose.cloud (free registration is required).
$ClientSecret="xxxxxxxxxxxxxxxxxxxxxxxxxxxx";
$ClientId="xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx";
$wordsApi = new Aspose\Words\WordsApi($ClientId,$ClientSecret);
try {
$remoteDataFolder = "Temp";
$localFile = "C:/Temp/02_pages_adobe.docx";
$remoteFileName = "02_pages_adobe.docx";
$localFile1 = "C:/Temp/Sections.docx";
$remoteFileName1 = "Sections.docx";
$outputFileName = "TestAppendDocument.docx";
$uploadRequest = new Aspose\Words\Model\Requests\UploadFileRequest($localFile,$remoteDataFolder."/".$remoteFileName,null);
$wordsApi->uploadFile($uploadRequest);
$uploadRequest1 = new Aspose\Words\Model\Requests\UploadFileRequest($localFile1,$remoteDataFolder."/".$remoteFileName1,null);
$wordsApi->uploadFile($uploadRequest1);
$requestDocumentListDocumentEntries0 = new Aspose\Words\Model\DocumentEntry(array(
"href" => $remoteDataFolder . "/" . $remoteFileName1,
"import_format_mode" => "KeepSourceFormatting",
));
$requestDocumentListDocumentEntries = [
$requestDocumentListDocumentEntries0,
];
$requestDocumentList = new Aspose\Words\Model\DocumentEntryList(array(
"document_entries" => $requestDocumentListDocumentEntries,
));
$request = new Aspose\Words\Model\Requests\AppendDocumentRequest(
$remoteFileName,
$requestDocumentList,
$remoteDataFolder,
NULL,
NULL,
NULL,
$remoteDataFolder . "/" . $outputFileName,
NULL,
NULL
);
$result = $wordsApi->appendDocument($request);
##Download file
$request = new Aspose\Words\Model\Requests\DownloadFileRequest($remoteDataFolder."/".$outputFileName,NULL,NULL);
$result = $wordsApi->downloadFile($request);
copy($result->getPathName(),"AppendOutput.docx");
} catch (Exception $e) {
echo "Something went wrong: ", $e->getMessage(), "\n";
PHP_EOL;
}
?>
P.S: I'm developer evangelist at Aspose.
Upvotes: 0
Reputation: 416
You may merge two Word documents with PHPDocX with a single line of code: (Source: Merging Word documents with PHPDocX)
require_once 'path /classes/DocxUtilities.inc';
$newDocx = new DocxUtilities();
$myOptions = array('mergeType' => 0);
$newDocx->mergeDocx('firstWordDoc.docx', 'secondWordDoc.docx', 'mergedWord.docx',
$myOptions);
This merging let you preserve all section structure (paper size, margins, associated footers and headers,...), includes all the required styles, manages all lists (this may seem trivial but it is not so in the OOXML standard), preserves images and charts as well as footnotes, endnotes and comments.
Moreover there is an option to preserve the original numberings (by default the page numbering continues).
One also may, via the mergeType option, to discard the section structure of the merged document and add it at the end of the first document as part of its last section. In this case, of course, the headers and footers are not imported but all other elements are still preserved.
Upvotes: 0
Reputation: 5552
Merging two different Docx files may be very complicated because Headers, Styles, Charts, Comments, User Modification Traces and other special contents are saved in separate inner XML sub-files in each Docx. Thus, two Docx may have different objects having the same ids. So it would be a very huge job to list all possible objects in the two documents, give them new inner ids, and re-affect them in a single one. Probably only Ms Office can do this currently.
Nevertheless, if you know that your two documents to be merged have the same styles, and if you know you have no charts, headers and other special objects, then the merging becomes something quite easy to perform.
In this case, you only have to use a Zip reader, such as TbsZip, to open the first Docx file (which is technically a zip archive containing XML sub-files) ; then read the sub-file "word/document.xml" and extract the part which is between the tags < w:body > and < /w:body >. In the second Docx file, open the "word/content.xml" and insert the previous content just before the tag < /w:body >. Save the result in a new Docx file.
This can be done using TbsZip, like this :
<?php
include_once('tbszip.php');
$zip = new clsTbsZip();
// Open the first document
$zip->Open('doc1.docx');
$content1 = $zip->FileRead('word/document.xml');
$zip->Close();
// Extract the content of the first document
$p = strpos($content1, '<w:body');
if ($p===false) exit("Tag <w:body> not found in document 1.");
$p = strpos($content1, '>', $p);
$content1 = substr($content1, $p+1);
$p = strpos($content1, '</w:body>');
if ($p===false) exit("Tag </w:body> not found in document 1.");
$content1 = substr($content1, 0, $p);
// Insert into the second document
$zip->Open('doc2.docx');
$content2 = $zip->FileRead('word/document.xml');
$p = strpos($content2, '</w:body>');
if ($p===false) exit("Tag </w:body> not found in document 2.");
$content2 = substr_replace($content2, $content1, $p, 0);
$zip->FileReplace('word/document.xml', $content2, TBSZIP_STRING);
// Save the merge into a third file
$zip->Flush(TBSZIP_FILE, 'merge.docx');
Upvotes: 7