mche
mche

Reputation: 656

Fixing "Word found unreadable content in corrupt..." programmatically

I'm getting a OpenXml generated docx file from another system. When try using open the file in my application using Microsoft.Office.Interop.Word.Application.Open(filename) I get a The file appears to be corrupted exception.

When I manually open the docx file I'm greeted with a Word found unreadable content in corrupt xxx.docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes. prompt. When I click Yes, it is able to recover the document in a new unsaved Word file.

I have tried comparing the previous corrupt.docx file's document.xml with the recovered.docx file's document.xml. While there are many of formatting changes between the two document.xmls (extra space between closing xml-tags), the main difference was the AltChunk actually was embedded into the recovered.docx and there were several empty "run" tags that got removed. I'm not sure what would be causing the file to be considered corrupt as those don't seem like they should.

That said, is there a way to run whatever process happens when I click Yes to that ...Do you want to recover the contents of this document?... prompt programatically through my application; this would be the ideal? Less preferably, is there a way to tell what parts of the xml is actually corrupting in a word doc?

Upvotes: 0

Views: 9751

Answers (2)

PouriaDiesel
PouriaDiesel

Reputation: 735

I had this problem after Microsoft Word crashed and my document became corrupted. I found this way for recover it:

  1. Upload it in Google Drive and open it with Google Docs.
  2. Google docs open my document but some of document elements like borders and equations are corrupted that I fix them manually later.
  3. copy the last line of document and remove it.
  4. download the document and paste last line to document.

Upvotes: 0

Cindy Meister
Cindy Meister

Reputation: 25693

That said, is there a way to run whatever process happens when I click Yes to that ...Do you want to recover the contents of this document?... prompt programnatically through my application; this would be the ideal? Less preferably, is there a way to tell what parts of the xml is actually corrupting in a word doc?

  1. No, that's not exposed to the outside
  2. Theoretically, validation could be possible. But given there's an AltChunk involved, that might not turn up the problem. The content of AltChunk isn't integrated until Word processes the document, at which time it's integrated. And if what's coming in "breaks" something, the validation won't pick that up.

In this particular case, I might try removing the AltChunk manually (the pieces are in a few places in the zip file) and see if the file can open without it. But if you're not intimately familiar with the Word Open XML zip package it might be better to ask the producer/source of the document.

Upvotes: 1

Related Questions