Reputation: 33
I am using the code below to find a list all docx
, xlsx
and pdf
files in a directory and link to the files (taken from this site).
I would like to show docx
properties such as Title
, Author
and any tags have have been added to the document. Is there a way to display those properties using just PHP?
<div id="container">
<table class="sortable">
<thead>
<tr>
<th>Filename</th>
<th>Date Modified</th>
</tr>
</thead>
<tbody>
<div align="center">
<?php
// Opens directory
$myDirectory=opendir(".");
// Set Accepted Files
$acceptExts = array("docx", "pdf", "xlsx");
// Gets Each Entry
while($entryName = readdir($myDirectory)) {
$exts = explode(".", $entryName);
if(in_array($exts[1],$acceptExts)) {
$dirArray[] = $entryName;
}
}
// Finds extensions of files
function findexts ($filename) {
$filename=strtolower($filename);
$exts=split("[/\\.]", $filename);
$n=count($exts)-1;
$exts=$exts[$n];
return $exts;
}
// Closes directory
closedir($myDirectory);
// Counts elements in array
$indexCount=count($dirArray);
// Sorts files
sort($dirArray);
// Loops through the array of files
for($index=0; $index < $indexCount; $index++) {
// Allows ./?hidden to show hidden files
if($_SERVER['QUERY_STRING']=="hidden")
{$hide="";
$ahref="./";
$atext="Hide";}
else
{$hide=".";
$ahref="./?hidden";
$atext="Show";}
if(substr("$dirArray[$index]", 0, 1) != $hide) {
// Gets File Names
$name=$dirArray[$index];
$namehref=$dirArray[$index];
// Gets Extensions
$extn=findexts($dirArray[$index]);
// Gets file size
$size=number_format(filesize($dirArray[$index]));
// Gets Date Modified Data
$modtime=date("M j Y", filemtime($dirArray[$index]));
$timekey=date("Ymd", filemtime($dirArray[$index]));
// Separates directories
if(is_dir($dirArray[$index])) {
$extn="<Directory>";
$size="<Directory>";
$class="dir";
} else {
$class="file";
}
// Cleans up . and .. directories
if($name=="."){$name=". (Current Directory)"; $extn="<System Dir>";}
if($name==".."){$name=".. (Parent Directory)"; $extn="<System Dir>";}
//Display to screen
print("
<tr class='$class'>
<td><a href='./$namehref'>$name</a></td>
<td sorttable_customkey='$timekey'><a href='./$namehref'>$modtime</a></td>
</tr>");
}
}
?>
Upvotes: 3
Views: 2755
Reputation: 26066
I would like to show docx properties such as Title, Author and any tags have have been added to the document. Is there a way to display those properties using just PHP?
What you are looking for is a tool that can extract metadata from a file. And once you understand what metadata is—basically data that describes data in a file or object—then 1/2 the job is done. The rest involves finding a tool that works best for your needs.
If you want a pure PHP solution, then look into using getID3
which is a nice & well developed PHP library that should be able to handle the task. Not 100% sure about it’s capabilities handling DOCx and other Microsoft formats, but it is worth looking at.
Also, Microsoft themselves have PHP library called PHPWord
that allows you to manipulate the contents of Microsoft DOCx & related documents so I assume metadata extraction is a part of the mix.
And past PHP specific libraries if you are on Linux or a Unix vairiant like Mac OS X, look into using a tool like exiftool
which I have used & highly recommend. Yes, it is a system binary file, but you can use it via exec()
calls in PHP to get it to work it’s magic.
Looking at your specific code, since it seems to only get the directory contents via readdir
, you would have to code some logic to hook into those filenames & paths & then pass the actual file to getID3
, PHPWord
or exiftool
to read the data into something.
So quickly looking at the loops in your code work, look at this line that gets filesize:
// Gets file size
$size=number_format(filesize($dirArray[$index]));
Well, before or after that line happens, you would need to do something like this:
// Gets file info metadata.
$getID3 = new getID3;
$file_info = $getID3->analyze($dirArray[$index]);
Then the contents of $file_info
would be an array of data connected to the file loaded in $dirArray[$index]
. How to access that data? Unclear right now, but you can look at what stuff it grabbed by dumping the contents of $file_info
like this.
echo '<pre>';
print_r($file_info);
echo '</pre>';
And then figure out where the data you want resides in $file_info
and then just access it like any other array.
Upvotes: 4