Reputation:
I'm working on a VB6 application that is used by over a hundred users. It generates a Word document, then saves a TIFF image of the document in a database. Currently, it simply sets the printer to Microsoft Office Document Image Writer, "prints" the document to a set location, then imports the resulting TIFF file into the database. However, the organization is in the process of upgrading everyone to Office 07, and this means that Microsoft Office Document Image Writer is going away. So, I'd like to know how hard it would be to programmatically convert from Word to TIFF.
We're already bringing in a C# (.NET 3.5) control library as COM, so that seems like a good place to put the functionality. At some point I'll be converting the whole app to 3.5, so I'd prefer that any new code be already there so there's less to convert.
EDIT: I appreciate the suggestions, but I'd really like to try and do this without using expensive third-party components. It's just hard to get the money guys to see the merit of spending thousands of dollars to fix something that used to work for free. Plus, I'm genuinely interested in what it would take to roll it myself. A bit masochistic, I know, but I got into programming because I'm cursed with a desire to know how things work... :)
Thanks for all your help!
Upvotes: 1
Views: 8355
Reputation: 650
You need to use Aspose.Word dotnet package. Here is a sample code:
public byte[] ConvertWordToTiff(byte[] sourceWordDoc)
{
return ConvertWord(sourceWordDoc, SaveFormat.Tiff);
}
private static byte[] ConvertWord(byte[] sourceWordDoc, SaveFormat format)
{
byte[] result = null;
try
{
var doc = new Document(new MemoryStream(sourceWordDoc));
ClearFormat(doc);
var options = SaveOptions.CreateSaveOptions(format);
options.PrettyFormat = true;
options.UseAntiAliasing = true;
options.UseHighQualityRendering = true;
using (var m = new MemoryStream())
{
doc.Save(m, options);
m.Position = 0;
result = m.ToArray();
}
}
catch (Exception ex)
{
LogManager.GetCurrentClassLogger().Fatal(ex);
}
return result;
}
private static void ClearFormat(Document doc)
{
for (var i = 0; i < doc.Sections.Count; i++)
{
var nodes = doc.Sections[i].GetChildNodes(NodeType.Run, true);
if (nodes == null || nodes.Count <= 0) continue;
foreach (var item in (from Run item in nodes
where item.Font.Name.ToLower().Contains("nastaliq")
select item).ToList())
{
item.Font.Name = "Times New Roman";
item.Font.Size = item.Font.Size > 12 ? 12 : item.Font.Size;
}
}
}
Upvotes: 0
Reputation: 199
You can convert a Word document to a TIFF programatically by utilizing the standard "Fax" driver that is supplied with Microsoft Windows. The key to this working is ensuring the OutputFileName has an extension of ".tiff" Here is the sample code (VB.net & Word 2010):
Dim objWdDoc As Word.Document
Dim objWord As Word.Application
Dim sDesktop As String = Environment.GetEnvironmentVariable("userprofile") & "\Desktop\"
objWord = CreateObject("Word.Application")
objWdDoc = objWord.Documents.Open(sDesktop & "testdocument.doc")
objWord.Visible = True
'Select Printer
objWord.ActivePrinter = "Fax"
'Print to Tiff
objWdDoc.PrintOut(Range:=WdPrintOutRange.wdPrintAllDocument, _
OutputFileName:=sDesktop & "test.tiff", _
Item:=WdPrintOutItem.wdPrintDocumentContent, _
PrintToFile:=True)
'Close Document
objWdDoc.Close()
'Close Word
objWord.Quit()
'General Cleanup
objWdDoc = Nothing
objWord = Nothing
Upvotes: 1
Reputation:
Microsoft Office Document Image Writer is still available in Office 2007 (at least with Enterprise) - it's an optional component.
Upvotes: 0
Reputation: 12590
As far as I know (and a quick google seems to confirm this), both the TIFF format and DOC binary format specifications are available for free on the web. Therefore, and this would be a fairly big and complex project (I'm thinking man months rather than man weeks), you could write code to read the DOC document and populate an object model. You could then write more code to then output the object model as a TIFF document.
But, just think of some of the complexities: Tables, formatting, character sets, spacing, embedded content, etc. Eek! I guess this is why it is normally the job of expensive third party libraries or professional document management systems.
Out of interest, might this be the time to move away from proprietary document formats and store the document in the DB as something more manageable?
Upvotes: 2