user1832196
user1832196

Reputation: 41

Opening an existing .doc file using ofstream in C++

Assuming I have a file with .doc extension in Windows platform, how can I open the the file for outputting its contents on the screen using the ofstream object in C++? I am aware that the object can be used to open files in text and binary modes. But I would like to know if a .doc (or even .pdf) file can be opened and its contents read.

Upvotes: 4

Views: 4947

Answers (2)

austin
austin

Reputation: 5876

I've never actually done this before, but after reading up on it, I think I might have a suggestion. The .docx format is actually just XML that is zipped up. After unzipping, the file is located at word/document.xml. Doing this in a program is where it gets fun.

Two options: If you're using C++ CLR (.NET) then Microsoft has an SDK for you. It should make it pretty easy to open Office documents.

Otherwise if you're just using regular C++, you might have to do some extra work.

  1. Open the file and unzip it using a library like zlib
  2. Find the document.xml file inside
  3. Parse the XML document. You'll probably want to use some kind of XML parsing library for this. You'll have to look up the specs for the XML to figure out how to get the text you want.

Upvotes: 2

PiotrNycz
PiotrNycz

Reputation: 24412

C++ std library has ifstream class that can be used to read simple text files, and for read binary files too.

It is up to you to interpret these bytes in the file. To proper interpret the binary file you need to know the format of the file.

If you think of MS Word files then I would start from here: http://en.wikipedia.org/wiki/Office_Open_XML to understand MS Word 2007 format.

You might find the Boost Iostreams library ( http://www.boost.org/doc/libs/1_52_0/libs/iostreams/doc/home.html ) somehow useful if you want to make some filter by yourself.

Upvotes: 1

Related Questions