Vladimir Alexiev
Vladimir Alexiev

Reputation: 2601

OLE Automation with Perl in 2015

I'm trying to open a Word docx file from Perl, then save it as HTML. I got Win7 63bit and Office 15 (Office 365 subscription). I read a few examples and tried with Strawberry and ActiveState but I always get errors

perl -MWin32::OLE -e "$wd = Win32::OLE->GetObject('1994.62_01_fnd_en.docx'); print Win32::OLE->LastError"
Win32::OLE(0.1712) error 0x80004005: "Unspecified error"

perl -e "use Win32::OLE::Const('.*Word.*')"
No type library matching ".*Word.*" found at -e line 1.
Win32::OLE(0.1712): GetOleTypeLibObject() Not a Win32::OLE::TypeLib object at C:/Perl64/lib/Win32/OLE/Const.pm line 49.

Upvotes: 1

Views: 1252

Answers (1)

Vladimir Alexiev
Vladimir Alexiev

Reputation: 2601

Although GetObject doesn't work for me, Win32::OLE->new('Word.Application') worked, and I was able to do the job with a script like this

use Win32::OLE; # http://search.cpan.org/~jdb/Win32-OLE-0.1712/lib/Win32/OLE.pm
use Win32::OLE::Variant; # http://search.cpan.org/~jdb/Win32-OLE-0.1712/lib/Win32/OLE/Variant.pm
use constant true  => Variant->new(VT_BOOL, 'true');
use constant false => Variant->new(VT_BOOL, 'false');

use Cwd;
# use Path::Abstract qw(path); # http://search.cpan.org/~rokr/Path-Abstract-0.096/lib/Path/Abstract.pm#$path->extension
use constant MAX => 1024000; # max file size to open

# https://msdn.microsoft.com/en-us/library/office/ff839952.aspx
use constant wdFormatUnicodeText => 7;
use constant wdFormatFilteredHTML => 10;
# use Win32::OLE::Const '.*Microsoft Word'; # http://search.cpan.org/~jdb/Win32-OLE-0.1712/lib/Win32/OLE/Const.pm
# No type library matching ".*Word" found at -e line 1.
# Win32::OLE(0.1712): GetOleTypeLibObject() Not a Win32::OLE::TypeLib object at C:/Perl64/lib/Win32/OLE/Const.pm line 49.

my $w = Win32::OLE->new('Word.Application');
# https://msdn.microsoft.com/en-us/library/aa171814(v=office.11).aspx
$w->ChangeFileOpenDirectory(cwd);
for my $doc (<doc/*>) {
  next if -s $doc > MAX;
  my $html = $doc; $html =~ s{\bdocx?\b}{html}g;
  my $txt  = $doc; $txt  =~ s{\bdocx?\b}{txt}g;
  # https://msdn.microsoft.com/EN-US/library/office/ff835182.aspx
  $d = $w->Documents->Open ($doc, {ConfirmConversions => false, ReadOnly => true, OpenAndRepair => false, AddToRecentFiles => false, Visible => false});
  # https://msdn.microsoft.com/en-us/library/office/ff836084.aspx
  $d->SaveAs2({FileName => $html, FileFormat => wdFormatFilteredHTML});
  $d->SaveAs2({FileName => $txt,  FileFormat => wdFormatUnicodeText});
  # https://msdn.microsoft.com/EN-US/library/office/ff196343.aspx
  $d->Close;
  last;
}

Upvotes: 2

Related Questions