Reputation: 730
Thanks in advance for any help and assistance.
I'm trying to find some utility / direction on how best to compare two word docx files (Original and modified version) for differences and then to highlight the changes in the modified version in c#.
Again many thanks for any assistance you can provide.
Upvotes: 8
Views: 5751
Reputation: 31
I'll refresh this topic a little. Currently "Open XML SDK 2.5 Productivity Tool" does the thing. I found it very usefull in differing pptx\docx\xlsx files. Open XML SDK 2.5
If you're using Visual Studio you should also consider adding this plugin: Open XML Package Editor for Visual Studio. It's very usefull when you have quickly have a look into file or change something.
Upvotes: 0
Reputation: 28320
You could use XMLDiff.exe utility that is part of the MS 'XML Diff and Patch Tool'.
Read more in MSDN article "Using the XML Diff and Patch Tool in Your Applications".
The download link: Xmldiffpatch.exe (also at the very beginning of the MSDN article).
Upvotes: 2
Reputation: 96557
The OpenXML SDK 2.0 Toolkit comes with a tool that does this. It's called OpenXMLDiff. You can also read about what else the toolkit offers here: An introduction to Open XML SDK 2.0.
If that's not what you need then you're going to have to go through each package of the Open XML documents and determine the differences between them.
Upvotes: 6
Reputation: 3647
The document content is XML-tagged and broken up depending on whatever options, changes, emphasis etc is added/modified/deleted between saves. Something as simple as adding and removing a newline can result in a different physical XML structure. There won't be any difference in the final actual content, but the XML might be different.
What exactly counts as a 'difference' you want to identify? That'll determine how much parsing of the internal structure you need to do and what techniques or tools you can use to identify the differences.
Upvotes: 0
Reputation: 136663
A docx file is a renamed zip file. You could rename it to .zip and extract it out.
However the docx is not a zip of a single file.. its a folder hierarchy with xml files in it. So you could extract it out and script a comparing utility like Beyond Compare to get the differences.
I'm not sure how you would present the differences though ? Do you want to visually show the difference in the Word documents? e.g. this paragraph is missing in the second file etc.
Upvotes: 0