Reputation: 3018
I'm doing a project now, and I'm stuck with reading word documents.
Word File content.
This is a test word file in PHP.
Thank you.
PHP code.
$myFile = "wordfile.docx";
$fh = fopen($myFile, 'r');
$theData = fread($fh, 1000);
fclose($fh);
echo $theData;
output:
PK!éQ°Â[Content_Types].xml ¢( ´”MOÂ@†ï&þ‡f¯¦]ð`Œ¡pP<*‰Ïëv
«Ýì,_ÿÞiI¡(ziÒNß÷}fÚÞ`©‹h•5)ë&‘6Sf’²×ñc|Ë"Âd¢°R¶dƒþåEo
¼r€© ¦l‚»ãå´ÀÄ:0TÉ×"Пp'䧘¿îtn¸´&€ q(=X¿÷¹˜!.éñ
š„ä,º_¿WF¥L8W()ò²Êu <"œ›l.Þ%¤¬Ìqª^Nøp0ÙKPºl*Õ3Ó
«¢‘ðáIhbçë3žY9ÓÔwr¼¹F›çJB/Ýœ·é;é"©+Z(³e?ÈaUþ=ÅÚ÷Ä
ø7¦Ã<I?Hû<4ÆeÓÉ:bGÛž!ÐN ùþÛÆmCÇs+ÂÞ_þbǼ$§ó4ïœ
0ñ£¶n…´#€W×îٕͱH:#oÒÎñ¿h{»JuLGÎ êõÐtÄêDZXg÷åFÌ kÈæÕîÿÿPK
!ÇÂ'¼ß_rel
IS there anyway to read the word document in PHP ?
Upvotes: 8
Views: 55958
Reputation: 111
The following is a similar function to the one in @suhdir's answer, but for PHP 8:
function readDocx($filename)
{
$zip = new ZipArchive();
if ($zip->open($filename)) {
$content = $zip->getFromName("word/document.xml");
$zip->close();
$content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
$content = str_replace('</w:r></w:p>', "\r\n", $content);
return strip_tags($content);
}
return false;
}
Zip functions have been deprecated in PHP 8 and substituted by ZipArchive.
Upvotes: 6
Reputation: 192
"PHPWord is a library written in pure PHP that provides a set of classes to write to and read from different document file formats." (PHPOffice, 2016)
This open php library should solve your problem. you can eighter download it oder get it by composer:
https://github.com/PHPOffice/PHPWord
Upvotes: 6
Reputation: 837
For docx use this function
function read_docx($filename){
$striped_content = '';
$content = '';
if(!$filename || !file_exists($filename)) return false;
$zip = zip_open($filename);
if (!$zip || is_numeric($zip)) return false;
while ($zip_entry = zip_read($zip)) {
if (zip_entry_open($zip, $zip_entry) == FALSE) continue;
if (zip_entry_name($zip_entry) != "word/document.xml") continue;
$content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
zip_entry_close($zip_entry);
}
zip_close($zip);
$content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
$content = str_replace('</w:r></w:p>', "\r\n", $content);
$striped_content = strip_tags($content);
return $striped_content;
}
It will return text from docx
Upvotes: 17
Reputation: 60506
Word document isn't stored conveniently like a text file (it's more like xml / binary file), so you can't just use echo and expects it to output the human readable portion of the docx
file.
There's a library that could do what you want, but it takes only doc
file
Upvotes: 1
Reputation: 31621
"docx" is different from "doc". Docx files are basically xml files in a zipfile container (as described by wikipedia). Doc files are binary blobs.
I am aware of no library that can easily read docx files in php (although Phpdocx can write them). However, since these are just zip files and xml files, you should be able do put something together using ZipArchive
to open the docx container and DOMDocument
or SimpleXML
or XMLReader
or XSLTProcessor
to read the xml documents themselves.
Upvotes: 2