Reputation: 4580
I am trying to delete all embedded object from Word and PowerPoint files using openxml SDK. I am new to Open XML and not sure whether I am doing this correctly. Below is the code I have. My intention is to remove any objects embedded and to delete images embedded. Both codes when executed are giving errors.
Code that I tried to delete all embedded items in the document.
using (var wdDoc = WordprocessingDocument.Open(wordFilePath, true))
{
var docPart = wdDoc.MainDocumentPart;
var document = docPart.Document;
var embeddedObjectsCount = docPart.EmbeddedObjectParts.Count();
while (embeddedObjectsCount > 0)
{
docPart.DeletePart(docPart.EmbeddedObjectParts.FirstOrDefault());
embeddedObjectsCount = docPart.EmbeddedObjectParts.Count();
}
}
Code that I tried to delete all images in the document. (This works partially if I don't have any objects embedded in the document.)
using (var wdDoc = WordprocessingDocument.Open(wordFilePath, true))
{
var docPart = wdDoc.MainDocumentPart;
var document = docPart.Document;
var imageObjectsCount = docPart.ImageParts.Count();
while (imageObjectsCount > 0)
{
docPart.DeletePart(docPart.ImageParts.FirstOrDefault());
imageObjectsCount = docPart.ImageParts.Count();
}
}
When I run the above code the file I use is getting corrupted. I would like to know how to remove all embedded objects from Word without corrupting the file.
I haven't done anything on PowerPoint yet, but I hope it would be similar to Word document.
Upvotes: 2
Views: 5929
Reputation: 4580
I managed to find a solution for my problem. I had to dive in to the concepts of Open XML SDK to get this. However, I am not so sure on whether this is the optimal solution.
Goal
Remove all embedded objects in PowerPoint and Word.
Remove all images in PowerPoint and Word.
For Word
//using Ovml = DocumentFormat.OpenXml.Vml.Office;
//Determine whether there are any Embedded Objects in the document
using (var wdDoc = WordprocessingDocument.Open(wordFilePath, true))
{
var docPart = wdDoc.MainDocumentPart;
var docHasEmbeddedOleObjects = document.Body.Descendants<Ovml.OleObject>().Any();
if (docHasEmbeddedOleObjects)
{
foreach (var oleObj in document.Body.Descendants<Ovml.OleObject>())
{
oleObj.Remove(); //Remove each ole object in the document. This will remove the object from view in word.
}
//Delete the embedded objects. This will remove the actual attached files from the document.
docPart.DeleteParts(docPart.EmbeddedObjectParts);
//Delete all picture in the document
docPart.DeleteParts(docPart.ImageParts);
}
}
For PowerPoint
using (var ppt = PresentationDocument.Open(powerPointFilePath, true))
{
foreach (var slide in slides)
{
//Remove Ole Objects
var oleObjectCount = slide.Slide.Descendants<OleObject>().Count();
while (oleObjectCount > 0)
{
var oleObj = slide.Slide.Descendants<OleObject>().FirstOrDefault();
var oleObjGraphicFrame = oleObj?.Ancestors<GraphicFrame>().FirstOrDefault();
if (oleObjGraphicFrame != null)
{
oleObjGraphicFrame.RemoveAllChildren();
oleObjGraphicFrame.Remove();
}
oleObjectCount = slide.Slide.Descendants<OleObject>().Count();
}
//Delete embedded objects
slide.DeleteParts(slide.EmbeddedObjectParts);
//Delete all pictures
slide.DeleteParts(slide.ImageParts);
}
}
Upvotes: 1
Reputation: 506
In my experience, the fastest way to "corrupt" an OpenXML document is to have a bad relation pointer. The fastest way to get a handle of what's behind those cryptic error messages is to go straight to the raw OpenXML markup.
To get an idea of what is happening:
A.docx
B.docx
A.docx
and B.docx
to A.zip
and B.zip
Investigate the source file
First, inside of A.zip
, open the file called [Content_Types].xml
. Take note of the parts that you would like to remove. Think of this file as a declaration to the word processor of the types of files that it will encounter in the sub-directories.
Parts such as the document content (word/document.xml
) or the footnotes part (word/footnotes.xml
) have their own relations parts named as [part path here].rels
.
For example, document.xml.rels
will hold relation information for things like charts, hyperlinks, and images in document.xml
; likewise, footnotes.xml.rels
holds information on things like hyperlinks in footnotes.xml
.
Investigate the result file
Now open B.zip
and compare the [Content_Types].xml
files. Do you see a part there that you intended to delete? Is there a part missing that you did not intend to delete?
Inside of the word
sub-directory in B.zip
, do you see any embedded files that are not listed in the [Content_Types].xml
file?
If you take a look at the raw markup, and the error doesn't jump out at you, feel free to comment with what some more details about your file structure and we can troubleshoot from there.
Upvotes: 0