Jadenkun
Jadenkun

Reputation: 327

Convert ByteArray of Office document to ByteArray of PDF in C#

How can I convert a byte[] of an Office document (.doc, .docx, .xlsx, .pptx) to a byte[] of a PDF document assuming Office is installed and Microsoft.Office.Interop is used?

I fetch the files' byteArray from the database as well as their name. I would like to first convert each file to a PDF and then combine all of the PDFs to one single PDF using PDFSharp (this part is already implemented).

Code:

 foreach (Entity en in res.Entities)
 {
    byte[] fileByteArray = Convert.FromBase64String(en.GetAttributeValue<string>("documentbody"));
    string fileName = en.GetAttributeValue<string>("filename");
    string extension = fileName.Split('.')[1];
                            
    switch(extension)
    {
      case "doc":
      case "docx":
        byteArr.Add(ConvertWordToPdf(fileName, fileByteArray)); break;
      case "xlsx":
        byteArr.Add(ConvertExcelToPdf(fileName, fileByteArray)); break;
    }
 }

The problem is I'm not too sure how to implement these two methods. I tried using the following code:

 private byte[] ConvertWordToPdf(string fileName, byte[] fileByteArray)
    {
        string tmpFile = Path.GetTempFileName();
        File.WriteAllBytes(tmpFile, fileByteArray);

        Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();

        Document doc = app.Documents.Open(tmpFile);

        // Save Word doc into a PDF
        string pdfPath = fileName.Split('.')[0] + ".pdf";
        doc.SaveAs2(pdfPath, Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF);

        doc.Close();
        app.Quit();

        byte[] pdfFileBytes = File.ReadAllBytes(pdfPath);
        File.Delete(tmpFile);
        return pdfFileBytes;
    }

But it saves the file to disk and that's something I would like to avoid. Is doing the same operation without saving to disk possible?

Upvotes: 2

Views: 1288

Answers (1)

JonasH
JonasH

Reputation: 36341

If you check the documentation for Documents.Open there is no mentioning of opening a document directly from a stream. This is unfortunately an all to common problem in libraries. But there might be other libraries you could use that allow this.

I would not expect saving to a file to be a major performance issue since the conversion will probably be the dominating factor. But it might cause permission issues if your program is running in a very restrictive environment.

If you are keeping the file save method you should add some exception handling to ensure the temporary files are deleted even if an exception occurs. I have also seen issues where external programs release the file locks after some time, so it might be useful to try to delete the file multiple times.

Upvotes: 1

Related Questions