Mitch Moccia
Mitch Moccia

Reputation: 529

How do you display a formatted Word Doc in HTML/PHP?

What is the best way to display a formatted Word Doc in HTML/PHP?

Here is the code I currently have but it doesn't format it:

$word = new COM("word.application") or die ("Could not initialise MS Word object.");
$word->Documents->Open(realpath("ACME.doc"));

// Extract content.
$content = (string) $word->ActiveDocument->Content;

echo $content;

$word->ActiveDocument->Close(false);

$word->Quit();
$word = null;
unset($word);

Upvotes: 3

Views: 22154

Answers (3)

Tony Nassar
Tony Nassar

Reputation: 1

I needed correct XHTML, which Office won't give you (I do not understand that). You can use tools such as JTidy or TagSoup to fix the HTML, if you need to. Cf. http://slideguitarist.blogspot.com/2011/03/exporting-word-documents-to-html.html

Upvotes: 0

Mitch Moccia
Mitch Moccia

Reputation: 529

I figured this out. Check out the solution to reading a Word Doc and formatting it in HTML:

$filename = "ACME.doc";
$word = new COM("word.application") or die ("Could not initialise MS Word object.");
$word->Documents->Open(realpath($filename));

$new_filename = substr($filename,0,-4) . ".html";

// the '2' parameter specifies saving in txt format
// the '6' parameter specifies saving in rtf format
// the '8' parameter specifies saving in html format
$word->Documents[1]->SaveAs("C:/a1/projects/---full path--- /".$new_filename,8);
$word->Documents[1]->Close(false);
$word->Quit();
//$word->Release();
$word = NULL;
unset($word);

$fh = fopen($new_filename, 'r');
$contents = fread($fh, filesize($new_filename));
echo $contents;
fclose($fh);
//unlink($new_filename);

Couple of things... Having "charset=UTF-8" at the top of my PHP page was adding a bunch of diamonds with questions marks... I deleted that and it works perfectly.

Also, the SaveAs has to have the full path, at least locally, I added that to get it to work.

Thanks again for your help.

Upvotes: 4

Charles
Charles

Reputation: 51411

I know nothing about COM, but poking around the Word API docs on MSDN, it looks like your best bet is going to be using Document.SaveAs to save as wsFormatFilteredHTML to a temporary file, then serving that HTML to the user. Be sure to pick the filtered HTML, otherwise you're going to get the soupiest tag soup ever.

Upvotes: 3

Related Questions