Reputation: 287
I have a third-party library, which creates xlsx-file. It doesn't use OpenXmlSDK, it combines file from fragments of the xml-markup. For zipping there are used ZipArchive class. But when I try to do with OpenXmlSDK
var document = SpreadsheetDocument.Open(fileStream, false);
it fails with error:
DocumentFormat.OpenXml.Packaging.OpenXmlPackageException: 'The specified package is invalid. The main part is missing.'
MS Excel opens this file normally. Resaving from Excel helps.
Also I unzip files, then zip them again (without any changes), try to call above code again and it works.
Where is the problem? How to zip xlsx-file ready for OpenXmlSDK?
SOLUTION
Problem was with saving file by third-party library. Files, included to zip have entry name with \
instead /
. Code of that library was edited to fix that and all is ok.
Upvotes: 2
Views: 3434
Reputation: 1151
In my case, I found the following XML at {root}/xl/_rels/workbook_xml.rels:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId3" Target="styles.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles"/>
<Relationship Id="rId2" Target="theme/theme1.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme"/>
<Relationship Id="rId1" Target="worksheets/sheet1.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet"/>
<Relationship Id="rId5" Target="calcChain.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/calcChain"/>
<Relationship Id="rId4" Target="sharedStrings.xml" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings"/>
</Relationships>
The problem is that the file calcChain.xml does not exist in the xl folder.
Seemingly, Excel is more tolerant, it just ignores these missing references, because Excel opens the workbook just fine. In fact, when you save the workbook it corrects the problem, evidently it regenerates all the metadata with strict compliance to the standard. In fact, that's how I found the issue. I saved the original to a copy, renamed both workbooks as .zip files, and then painstakingly combed through all the XML files looking for differences between the XML metadata in both files. This discrepancy turned up. I found that when correcting that discrepancy in the ORIGINAL file, my program was able to open the original (fixed) workbook.
Does anyone know if there is a way to use the OpenXML libraries to trap and correct / suppress these missing references on the fly, when opening the document? (The way Excel must be doing.)
Upvotes: 0
Reputation: 5723
After some research I found people complaining about this exception in two scenarios:
Since You open the file from a stream, the second cause is rather not applicable in this case.
If font usage is not the cause, try to manually compare file versions before and after saving with Excel in Open XML Productivity Tool (https://www.microsoft.com/en-us/download/details.aspx?id=30425).
If there are no differences in documents' contents, try to compare archive compression settings.
UPDATE
It seems I've found some more information about the issue that can help to find the solution.
I was able to reproduce The main part is missing. error by creating archive with: ZipFile.CreateFromDirectory(@"C:\DirToCompress", destFilePath, CompressionLevel.Fastest, false);
.
Then, I've checked that opening the file with Package.Open(destFilePath, FileMode.Open, FileAccess.Read)
actually listed 0 parts found in the file.
After verifying some differences, I noticed that in the correct xlsx file, entries nested within folders in the archive have FullName
paths presented using /
character, for example: _rels/.rels
. In the corrupted file, the names were written with \
character, for example: _rels\.rels
.
You can investigate it by opening a file using ZipArchive
class (for example: new ZipArchive(archiveStream, ZipArchiveMode.Read, false, UTF8Encoding.UTF8);
) and inspecting the Entries
collection.
The important thing to note is that there are naming rules for parts described in the Office Open XML specification: https://www.ecma-international.org/news/TC45_current_work/Office%20Open%20XML%20Part%202%20-%20Open%20Packaging%20Conventions.pdf
As a test, I wrote a code that opens the corrupted xlsx file using ZipArchive
class and rewrites each entry by copying its contents and replacing \
with /
for the name of the recreated entry. After this operation, the resulting file seems to be opened correctly by SpreadsheetDocument.Open(...)
method.
Please note that the name fixing method I used was very simple and may be not enough or working correctly in some scenarios. However, these notes may help to find a desired solution for the issue.
Upvotes: 1