Reputation: 45
I am trying to convert a docx file to a pdf. I am using code from stackoverflow, but modified to allow for the dynamic selection of a file to open (rather than a hard-coded value). When I run it, I get an exception on the Open() method - could not find file. I select the file using a fileupload control so I know the file is there. What's going on?
Here is my code:
using System;
using System.IO;
using Microsoft.Office.Interop.Word;
using OpenXmlPowerTools;
namespace DocxToPdf
{
public partial class WebForm1 : System.Web.UI.Page
{
public Microsoft.Office.Interop.Word.Document wordDoc;
protected void Page_Load(object sender, EventArgs e)
{
}
protected void UploadButton_Click(object sender, EventArgs e)
{
if (DocxFileUpload.HasFile)
{
string docxFile = DocxFileUpload.PostedFile.FileName;
FileInfo fiFile = new FileInfo(docxFile);
if (Util.IsWordprocessingML(fiFile.Extension))
{
Guid pdfFileGuid = Guid.NewGuid();
string pdfFileLoc = string.Format(@"c:\windows\temp\{0}.pdf", pdfFileGuid.ToString());
Microsoft.Office.Interop.Word.Application appWord = new Microsoft.Office.Interop.Word.Application();
wordDoc = appWord.Documents.Open(docxFile);
wordDoc.ExportAsFixedFormat(pdfFileLoc, WdExportFormat.wdExportFormatPDF);
MsgLabel.Text = "File converted to PDF";
}
else
{
MsgLabel.Text = "Not a WordProcessingML document.";
}
}
else
{
MsgLabel.Text = "You have not specified a file.";
}
}
}
}
The error occurs on the "wordDoc = appWord.Documents.Open(docxFile);" line.
The fileupload control FileName property has just the file name, not the fully qualified path. I understand why I'm getting a "file not found" error - it's because the file doesn't have the fully qualified path in it. My question to the group is, how do I get the fully qualified path and file name, so I can open it? I've run a debug session and examined all the properties of the fileupload control and the FileInfo control, but they don't have it. The "FullPath" property of the FileInfo control is set to "c:\Program Files (x86)\IIS Express\myfile.docx", but that's not where the file is located.
Here's some more information about the error: Exception System.Runtime.InteropServices.COMException in DocxToPdf.dll (Sorry, we couldn't find your file. Is it possible it was moved, renamed or deleted? C:\Windows...\myfile.docx...
I've googled around on this, but so far no luck. Please help! Thanks.
Upvotes: 0
Views: 2470
Reputation: 52290
First off, you should be aware that with web applications there are two machines at work-- the client (where the browser runs) and the server (where your app lives). Each has its own file system. The server cannot access the client's file system and vice versa-- this is for obvious security reasons. Now maybe it works on a development machine because you are running the site locally, but it would never work in a production environment.
So Microsoft Word cannot open a file that is located on the client machine. Period. The client can upload a file, and the FileUpload control will let you access the bytestream-- but it doesn't automatically save the file locally. You can't access the path, either, because the path is on the client's filesystem and the names of his folders are private information.
To get this scheme to work at all, you need to first save the uploaded file somewhere locally using FileUpload.SaveAs. Then you should use that saved file to open it up in Word. Something like this:
var filePath = Path.GetTempFileName();
DocxFileUpload.SaveAs(filePath);
var appWord = new Microsoft.Office.Interop.Word.Application();
var wordDoc = appWord.Documents.Open(filePath);
var convertedFilePath = Path.GetTempFileName();
wordDoc.ExportAsFixedFormat(convertedFilePath, WdExportFormat.wdExportFormatPDF);
You will then need to provide some means of getting the converted file back to the browser, by writing it to the HTTP response. Example:
Response.Clear();
Response.AddHeader("content-disposition", "attachment; filename=Converted.Pdf");
Response.AddHeader("content-type", "application/pdf");
Response.TransmitFile(convertedFilePath);
Don't forget to clean up your files afterward, or you will run out of disk space as more and more users use your application:
}
finally
{
File.Delete(filePath);
File.Delete(convertedFilePath);
}
I put the delete commands in a finally
block so that they run even if something goes wrong, e.g. the request times out. You need those files to get cleaned up no matter what. You might also want to schedule a system task to clean up the folder on a nightly basis, just in case one of the files is locked due to Word being hung, that sort of thing.
Also, make sure your application's AppPool can read and write to the temp folder.
If you want to use a separate handler for downloading
If you want to show other content alongside the PDF, you'd have to use a separate handler for downloading. Here's a rough outline:
There are three URLs used in this solution:
Upload.aspx
The page that allows the user to specify a file for uploadingConfirm.asp
The page that is displayed in response, which includes a large iFrameFile.ashx
The handler that returns PDF that is displayed in the iFrameYou've already coded Upload.aspx
.
Confirm.aspx
needs code to accept the upload, save locally, open Word, and convert the file. The path of the converted file needs to be converted to a token of some kind. The page then needs to return a page that contains an iFrame pointed at File.ashx?docID=token
.
File.ashx
needs to set the response headers, use the token to recreate the path of the PDF file, and return the file over the HttpResponse.
At some point you will need to figure out how to clean up the temp folder, perhaps with a job that runs regularly and deletes any .doc or .pdf file older than 10 minutes, that sort of thing.
Upvotes: 1