Reputation: 153
Can anybody help me to extract text from docx file in php?
Or is there any linux command for this?
I can extract text from pdf and doc, so docx to pdf or doc conversion in php (or linux command) will also work for me.
Upvotes: 2
Views: 1677
Reputation: 11
You can extract the text from docx file, please find the below code and you need to install ZipArchive file
public function docx_to_text($filename)
{
$input_file = 'tmp_file.zip';
copy($filename, $input_file); //copy file with path (content) to temp.zip file
$xml_filename = "word/document.xml"; //content file name
$zip_handle = new ZipArchive;
$output_text = "";
if(true === $zip_handle->open($input_file))
{
if(($xml_index = $zip_handle->locateName($xml_filename)) !== false)
{
$xml_datas = $zip_handle->getFromIndex($xml_index);
$xml_handle = DOMDocument::loadXML($xml_datas, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
$output_text = strip_tags($xml_handle->saveXML());
}
else
{
$output_text .="";
}
$zip_handle->close();
}
else
{
$output_text .="";
}
return $output_text;
}
Upvotes: 0
Reputation: 20544
It's quite easy to extract the text from the docx, you don't even need a dependency (except for the zip module which you should activate)
<?php
function read_docx($filename) {
$striped_content = '';
$content = '';
$zip = zip_open($filename);
if (!$zip || is_numeric($zip))
return false;
while ($zip_entry = zip_read($zip)) {
if (zip_entry_open($zip, $zip_entry) == FALSE)
continue;
if (zip_entry_name($zip_entry) != "word/document.xml")
continue;
$content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
zip_entry_close($zip_entry);
}// end while
zip_close($zip);
$content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
$content = str_replace('</w:r></w:p>', "\r\n", $content);
$striped_content = strip_tags($content);
return $striped_content;
}
echo read_docx("textExample.docx");
Thanks to Muhammad's question
Upvotes: 1
Reputation: 68446
Make use of OpenTBS
.
After including it .. Do like this..
include_once('tbs_class.php');
include_once('../tbs_plugin_opentbs.php');
$TBS = new clsTinyButStrong;
$TBS->Plugin(TBS_INSTALL, OPENTBS_PLUGIN);
$TBS->LoadTemplate('filename.docx');
echo $string = $TBS->Source; // your docx content is now in this variable
Upvotes: 0