Othman
Othman

Reputation: 3018

read word document in php

I'm doing a project now, and I'm stuck with reading word documents.

Word File content.

This is a test word file in PHP.

Thank you.

PHP code.

    $myFile = "wordfile.docx";
    $fh = fopen($myFile, 'r');
    $theData = fread($fh, 1000);
    fclose($fh);
    echo $theData;

output:

PK!éQ°Â[Content_Types].xml ¢( ´”MOÂ@†ï&þ‡f¯¦]ð`Œ¡pP<*‰Ïëv
 «Ýì,_ÿÞiI¡(ziÒNß÷}fÚÞ`©‹h•5)ë&‘6Sf’²×ñc|Ë"Âd¢°R¶dƒþåEo
 ¼r€© ¦l‚»ãå´ÀÄ:0TÉ­×"ЭŸp'䧘¿îtn¸´&€  q(=X¿÷¹˜!.éñ
 š„ä,º_¿WF¥L8W()ò²Êu <"œ›l.Þ%¤¬Ìqª^Nøp0ÙKPºl­*Õ3Ó
 «¢‘ðáIhbçë3žY9ÓÔwr¼¹F›çJB­/Ýœ·é;é"©+Z(³e?ÈaUþ=ÅÚ÷Ä
 ø7¦Ã<I?Hû<4ÆeÓÉ:bGÛž!ÐN    ùþÛÆmCÇs+ÂÞ_þbǼ$§ó4ïœ
 0ñ£¶n…´#€W×îٕͱH:#oÒÎñ¿h{»JuLGÎ êõÐtÄêDZXg÷åFÌ kÈæÕîÿÿPK
 !ÇÂ'¼ß_rel

IS there anyway to read the word document in PHP ?

Upvotes: 8

Views: 55958

Answers (5)

Marina DU
Marina DU

Reputation: 111

The following is a similar function to the one in @suhdir's answer, but for PHP 8:

    function readDocx($filename)
    {

        $zip = new ZipArchive();
        if ($zip->open($filename)) {
            $content = $zip->getFromName("word/document.xml");
            $zip->close();
            $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
            $content = str_replace('</w:r></w:p>', "\r\n", $content);

            return strip_tags($content);
        }
        return false;

    }

Zip functions have been deprecated in PHP 8 and substituted by ZipArchive.

Upvotes: 6

user2912903
user2912903

Reputation: 192

"PHPWord is a library written in pure PHP that provides a set of classes to write to and read from different document file formats." (PHPOffice, 2016)

This open php library should solve your problem. you can eighter download it oder get it by composer:

https://github.com/PHPOffice/PHPWord

Upvotes: 6

Sudhir
Sudhir

Reputation: 837

For docx use this function

function read_docx($filename){

    $striped_content = '';
    $content = '';

    if(!$filename || !file_exists($filename)) return false;

    $zip = zip_open($filename);
    if (!$zip || is_numeric($zip)) return false;

    while ($zip_entry = zip_read($zip)) {

        if (zip_entry_open($zip, $zip_entry) == FALSE) continue;

        if (zip_entry_name($zip_entry) != "word/document.xml") continue;

        $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

        zip_entry_close($zip_entry);
    }
    zip_close($zip);      
    $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
    $content = str_replace('</w:r></w:p>', "\r\n", $content);
    $striped_content = strip_tags($content);

    return $striped_content;
}

It will return text from docx

Upvotes: 17

Andreas Wong
Andreas Wong

Reputation: 60506

Word document isn't stored conveniently like a text file (it's more like xml / binary file), so you can't just use echo and expects it to output the human readable portion of the docx file.

There's a library that could do what you want, but it takes only doc file

Docvert

Upvotes: 1

Francis Avila
Francis Avila

Reputation: 31621

"docx" is different from "doc". Docx files are basically xml files in a zipfile container (as described by wikipedia). Doc files are binary blobs.

I am aware of no library that can easily read docx files in php (although Phpdocx can write them). However, since these are just zip files and xml files, you should be able do put something together using ZipArchive to open the docx container and DOMDocument or SimpleXML or XMLReader or XSLTProcessor to read the xml documents themselves.

Upvotes: 2

Related Questions