user560498
user560498

Reputation: 547

insert a word document into sql server database?

I feel quite overwhelmed with the variety on technologies I would need to use for the above task. I've searched the stack overflow stocks but couldn't pinpoint a solid check list of steps to do this.

I would like to get an overview of the steps/tools that need to be used when inserting a word document into a database.

I thought about:

  1. reading the word file as a FileStream.
  2. deserializing it into an xml object (word ml).
  3. somehow (not sure how) insert the word ml into a xml column in sql server.

is it possible to read word ml using the XMLSerializer object ? how would I then insert it to the database ?

Edit: I actually need to perform operations on the stored data like finding nodes using xpath, hence my need to store it as xml...

Upvotes: 2

Views: 8221

Answers (5)

user806549
user806549

Reputation:

You should either go with FileStream or ordinary BLOB-storage. FileStream does require a little more initial work, and I have had problems upgrading certain already installed databases to use this. Depending on your ability/willingness to reinstall servers to get this to work, you should certainly do a proof-of-concept before going too far. Technically, I've never had problems with using BLOBs

Some research has been done as to which should be preferred depending on your usage pattern. Ie. if your files are greater than 1Mb on average and you need fast read access, you might be better off using FileStream.

I've only rarely seen the performance difference myself, but I do prefer FileStream from a design viewpoint.

Take a look at:

http://technet.microsoft.com/en-us/library/bb933993.aspx

http://www.mssqltips.com/sqlservertip/1489/using-filestream-to-store-blobs-in-the-ntfs-file-system-in-sql-server-2008/

Upvotes: 2

Jeff
Jeff

Reputation: 14279

I don't think you'll be able to use an XML to do this; I believe Word docs have binary content in them. I would try the FileStream and storing it in the database as varbinary(max). This is certainly the most general and flexible way to handle it and you would be able to reuse the code you write for any other file type if you choose to expand the functionality later down the line.

Upvotes: 0

Dracorat
Dracorat

Reputation: 1194

Most of the time, if a file is to be stored "as is" in a database, it's stored as a "BLOB" or "Binary Large OBject".

Here's an article on how to read and write BLOB data to MSSQL: http://www.codecapers.com/post/manipulating-blob-data-in-mssql-with-c.aspx

If some part of your document also needs to be searchable from the database, you can create the BLOB as a column and still have additional columns or table relationships for these items (like categories, key words, date created, owner, etc)

Upvotes: 1

the_joric
the_joric

Reputation: 12226

Actually word format (2010+) is a zip containing bunch of XMLs :). I would recommend to use varbinary or text column.

Upvotes: 0

MethodMan
MethodMan

Reputation: 18843

You could look up creating a byte[] stream as well as lookup on google Microsoft.Interop there are tons of examples on MDSN online

If you are wanting to serialize look at BinaryStreaming CodeProject.com as well as Stackoverflow will provide tons of samples from past users / questions and Solutions.

Upvotes: 0

Related Questions