Emlyn
Emlyn

Reputation: 730

Diff 2 Open XML Word Documents

Thanks in advance for any help and assistance.

I'm trying to find some utility / direction on how best to compare two word docx files (Original and modified version) for differences and then to highlight the changes in the modified version in c#.

Again many thanks for any assistance you can provide.

Upvotes: 8

Views: 5751

Answers (6)

Todd Main
Todd Main

Reputation: 29155

There are a few OpenXML diff tools listed here.

Upvotes: 0

MChmurski
MChmurski

Reputation: 31

I'll refresh this topic a little. Currently "Open XML SDK 2.5 Productivity Tool" does the thing. I found it very usefull in differing pptx\docx\xlsx files. Open XML SDK 2.5

If you're using Visual Studio you should also consider adding this plugin: Open XML Package Editor for Visual Studio. It's very usefull when you have quickly have a look into file or change something.

Upvotes: 0

Dmitry Pavlov
Dmitry Pavlov

Reputation: 28320

You could use XMLDiff.exe utility that is part of the MS 'XML Diff and Patch Tool'.

Read more in MSDN article "Using the XML Diff and Patch Tool in Your Applications".

The download link: Xmldiffpatch.exe (also at the very beginning of the MSDN article).

Upvotes: 2

Ahmad Mageed
Ahmad Mageed

Reputation: 96557

The OpenXML SDK 2.0 Toolkit comes with a tool that does this. It's called OpenXMLDiff. You can also read about what else the toolkit offers here: An introduction to Open XML SDK 2.0.

If that's not what you need then you're going to have to go through each package of the Open XML documents and determine the differences between them.

Upvotes: 6

DaveE
DaveE

Reputation: 3647

The document content is XML-tagged and broken up depending on whatever options, changes, emphasis etc is added/modified/deleted between saves. Something as simple as adding and removing a newline can result in a different physical XML structure. There won't be any difference in the final actual content, but the XML might be different.

What exactly counts as a 'difference' you want to identify? That'll determine how much parsing of the internal structure you need to do and what techniques or tools you can use to identify the differences.

Upvotes: 0

Gishu
Gishu

Reputation: 136663

A docx file is a renamed zip file. You could rename it to .zip and extract it out.

However the docx is not a zip of a single file.. its a folder hierarchy with xml files in it. So you could extract it out and script a comparing utility like Beyond Compare to get the differences.

I'm not sure how you would present the differences though ? Do you want to visually show the difference in the Word documents? e.g. this paragraph is missing in the second file etc.

Upvotes: 0

Related Questions