Reputation: 379
Using asp.net I'd like to save user uploads of MS office word.doc or word.docx files for subsequent display. My code for grabbing the inputstream of the uploaded file, looping thru it with a streamreader and then saving it to a file with streamwriter, but the result is a mess with lots of nasty characters, even though it says it's UTF8 encoding.
Is there something I can do with encoding the string I build -- or something else -- that will do the trick?
An alternative would be to programatically save the uploaded word.doc as an html file, if anyone has any ideas of how to do that.
Here is my relevant code:
Dim htmlfile As String = Server.MapPath("drafts" & "/d" & draftID & ".html")
Dim strm As Stream = fileup1.PostedFile.InputStream
Dim sb As String = ""
Using sr As New StreamReader(strm)
Dim line As String = ""
While Not line Is Nothing
line = sr.ReadLine()
sb += line & "<br />"
End While
End Using
Dim sw As StreamWriter = New StreamWriter(htmlfile)
sw.Write(sb)
Upvotes: 0
Views: 731
Reputation: 24908
I'm afraid your approach is to read the Word document as a text file, but they are actually binary files (pkzipped in the case of docx!)
The approach you should probably take is using the managed Word library Microsoft.Office.Tools.Word
or the Word COM object to open the file and save as HTML. This way, you let Word handle the extremely dirty details of decoding it's own file format.
Here is the MSDN documentation for Document.SaveAs
and here is a simple COM example.
Upvotes: 1