user1841294
user1841294

Reputation: 21

Delete MS Word document portion using Open XML WordprocessingDocument

I'm using C# and the OpenXml DLL to modify an existing MS Word document. I'm successfully able to replace some tags in the document and then save the modification, but I'm not yet able to delete a portion of text.

For example, my document have many headings (Heading1 text style), followed by body text, and I'd like to programmatically delete given headings and all the following text until the next heading.

Example original document:

Heading 1 Body text 1 ... ...

Heading 2 Body text 2 ... ...

Heading 3 Body text 3 ... ...

If the user want to delete the Heading 2, the output document should result in:

Heading 1 Body text 1 ... ...

Heading 3 Body text 3 ... ...

Am I going the right way to do that? Does anyone have an idea how to do it?

Upvotes: 1

Views: 2826

Answers (2)

user1841294
user1841294

Reputation: 21

I include the code I used to solve the problem:

        List<OpenXmlElement> ElementsToDeleteList = new List<OpenXmlElement>();
        bool IsParagraphsToDelete = false;
        ...
        // Execute headings removal
        using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(sOutputFileName, true))
        {
            foreach (OpenXmlElement element in wordDoc.MainDocumentPart.RootElement.Descendants())
            {
                if (element.GetType() == typeof(Paragraph))
                {
                    Paragraph paragraph = (Paragraph)element;
                    if (paragraph.ParagraphProperties != null && paragraph.ParagraphProperties.ParagraphStyleId != null &&
                        paragraph.ParagraphProperties.ParagraphStyleId.Val != null && paragraph.ParagraphProperties.ParagraphStyleId.Val.Value != null)
                    {
                        if (paragraph.ParagraphProperties.ParagraphStyleId.Val.Value.ToLower().Contains(MainHeaderStyleName.ToLower()) ||
                            paragraph.ParagraphProperties.ParagraphStyleId.Val.Value.ToLower().Contains(SecondaryHeaderStyleName.ToLower()))
                        {
                            StringBuilder sb = new StringBuilder();
                            foreach (var run in paragraph.Elements<Run>())
                                sb.Append(run.InnerText);

                            string ChapterTitle = sb.ToString().Trim().ToUpper();
                            IsParagraphsToDelete = ListOfDocumentTests.Where(x => x.Title.ToUpper().Trim() == ChapterTitle && x.IsIncluded == false).FirstOrDefault() != null;

                            if (string.IsNullOrEmpty(ChapterTitle) && !IsParagraphsToDelete)
                                ElementsToDeleteList.Add(paragraph);
                        }
                    }
                }

                if (IsParagraphsToDelete && (element.GetType() == typeof(Paragraph) || element.GetType() == typeof(Table)))
                {
                    ElementsToDeleteList.Add(element);
                }

            }

            foreach (OpenXmlElement elemToDelete in ElementsToDeleteList)
            {
                elemToDelete.RemoveAllChildren();
                elemToDelete.Remove();
            }


            wordDoc.MainDocumentPart.Document.Save();

        }

Upvotes: 1

jn1kk
jn1kk

Reputation: 5102

It depends on how the data (paragraph) are organized.

If heading and paragraph are next to each other, just cycle through the paragraphs, find the one with the heading and delete the next paragraph.

bool remove = false;

foreach(Paragraph p in body.Descendants<Paragraph>()) {

    if (remove)
    {
        p.Remove();
        remove = !remove;
        continue;
    }

    if(p.InnerText.Contains("Heading 2")) {

        p.Remove();
        remove = !remove;

    }

}

Upvotes: 2

Related Questions