CErdmier
CErdmier

Reputation: 33

Show docx properties (title, tags) using PHP for files in dir

I am using the code below to find a list all docx, xlsx and pdf files in a directory and link to the files (taken from this site).

I would like to show docx properties such as Title, Author and any tags have have been added to the document. Is there a way to display those properties using just PHP?

<div id="container">

<table class="sortable">
  <thead>
        <tr>
          <th>Filename</th>
          <th>Date Modified</th>
        </tr>
  </thead>
  <tbody>
    <div align="center">
      <?php
        // Opens directory
        $myDirectory=opendir(".");

       // Set Accepted Files
        $acceptExts = array("docx", "pdf", "xlsx");

        // Gets Each Entry
        while($entryName = readdir($myDirectory)) { 
        $exts = explode(".", $entryName); 
         if(in_array($exts[1],$acceptExts)) { 
          $dirArray[] = $entryName;
         }
        }

        // Finds extensions of files
        function findexts ($filename) {
          $filename=strtolower($filename);
          $exts=split("[/\\.]", $filename);
          $n=count($exts)-1;
          $exts=$exts[$n];
          return $exts;
        }

        // Closes directory
        closedir($myDirectory);

        // Counts elements in array
        $indexCount=count($dirArray);

        // Sorts files
        sort($dirArray);

        // Loops through the array of files
        for($index=0; $index < $indexCount; $index++) {

          // Allows ./?hidden to show hidden files
          if($_SERVER['QUERY_STRING']=="hidden")
          {$hide="";
          $ahref="./";
          $atext="Hide";}
          else
          {$hide=".";
          $ahref="./?hidden";
          $atext="Show";}
          if(substr("$dirArray[$index]", 0, 1) != $hide) {

          // Gets File Names
          $name=$dirArray[$index];
          $namehref=$dirArray[$index];

          // Gets Extensions 
          $extn=findexts($dirArray[$index]); 

          // Gets file size 
          $size=number_format(filesize($dirArray[$index]));

          // Gets Date Modified Data
          $modtime=date("M j Y", filemtime($dirArray[$index]));
          $timekey=date("Ymd", filemtime($dirArray[$index]));

          // Separates directories
          if(is_dir($dirArray[$index])) {
            $extn="&lt;Directory&gt;"; 
            $size="&lt;Directory&gt;"; 
            $class="dir";
          } else {
            $class="file";
          }

          // Cleans up . and .. directories 
          if($name=="."){$name=". (Current Directory)"; $extn="&lt;System Dir&gt;";}
          if($name==".."){$name=".. (Parent Directory)"; $extn="&lt;System Dir&gt;";}

          //Display to screen
          print("
          <tr class='$class'>
            <td><a href='./$namehref'>$name</a></td>
            <td sorttable_customkey='$timekey'><a href='./$namehref'>$modtime</a></td>
          </tr>");
          }
        }
      ?>

Upvotes: 3

Views: 2755

Answers (1)

Giacomo1968
Giacomo1968

Reputation: 26066

I would like to show docx properties such as Title, Author and any tags have have been added to the document. Is there a way to display those properties using just PHP?

What you are looking for is a tool that can extract metadata from a file. And once you understand what metadata is—basically data that describes data in a file or object—then 1/2 the job is done. The rest involves finding a tool that works best for your needs.

If you want a pure PHP solution, then look into using getID3 which is a nice & well developed PHP library that should be able to handle the task. Not 100% sure about it’s capabilities handling DOCx and other Microsoft formats, but it is worth looking at.

Also, Microsoft themselves have PHP library called PHPWord that allows you to manipulate the contents of Microsoft DOCx & related documents so I assume metadata extraction is a part of the mix.

And past PHP specific libraries if you are on Linux or a Unix vairiant like Mac OS X, look into using a tool like exiftool which I have used & highly recommend. Yes, it is a system binary file, but you can use it via exec() calls in PHP to get it to work it’s magic.

Looking at your specific code, since it seems to only get the directory contents via readdir, you would have to code some logic to hook into those filenames & paths & then pass the actual file to getID3, PHPWord or exiftool to read the data into something.

So quickly looking at the loops in your code work, look at this line that gets filesize:

// Gets file size 
$size=number_format(filesize($dirArray[$index]));

Well, before or after that line happens, you would need to do something like this:

// Gets file info metadata.
$getID3 = new getID3;
$file_info = $getID3->analyze($dirArray[$index]);

Then the contents of $file_info would be an array of data connected to the file loaded in $dirArray[$index]. How to access that data? Unclear right now, but you can look at what stuff it grabbed by dumping the contents of $file_info like this.

echo '<pre>';
print_r($file_info);
echo '</pre>';

And then figure out where the data you want resides in $file_info and then just access it like any other array.

Upvotes: 4

Related Questions