Reputation: 918
I am creating a website in which authors can create EPUB files. Users will be uploading their books in the .doc
format. I need to create EPUB file out of that. One single doc file will be having multiple chapters. So I need to parse the doc
file and split it into chapters. Authors will be using Heading 1
for their chapter titles.
So in PHP
, is there any way to parse doc
files to HTML
and split it into chapters using Heading 1
, so that I can create EPUB file.
After some research, I got one linux app. But I think, it will convert doc to plain text. So I will not be able to split the chapters.
Please suggest me the a solution if you have. Thanks in advance.
Upvotes: 3
Views: 2541
Reputation: 68476
You can achieve this using PHPDOCX API.
First try to generate the XHTML from your Word document using this function reference
Something like this..
require_once '../../classes/TransformDoc.inc';
$document = new TransformDoc();
$document->setStrFile('../files/Text.docx');
$document->generateXHTML();
$document->validatorXHTML();
echo $document->getStrXHTML();
After getting the XHTML content you can do various processings like removing chapter,etc.
Complete documentation can be found here.
Upvotes: 1