Convert Docx to html using OpenXml power tools without formatting

Question

I'm using OpenXml Power tools in my project to convert a document (docx) into html, using the code already provided with this sdk it produces an elegant duplicate in html form.(Github link : https://github.com/OfficeDev/Open-Xml-PowerTools/blob/vNext/OpenXmlPowerToolsExamples/HtmlConverter01/HtmlConverter01.cs )

However looking at the html markup, the html has embedded styling.

Is there any way of turning this off and using plain and simple

and

tags ?

I would like to know this embedded styling as the formatting would be taken care of by bootstrap.

The embedded styling is as follows :

This as you can see is fine if you want a direct copy, but not if you want to control the style yourself.

In the C# code i have already made the following ajustments :

AdditionalCss is commented out
FabricateCssClasses is false
CssClassPrefix is commented out

Many thanks.

Xiaoy312 · Accepted Answer

If you can also the XmlReader and XmlWriter to obtain a bare bone html. This could however be a little overkill, as only the tag itself and its text content will be kept.

public static class HtmlHelper
{
    /// 
    /// Keep only the openning and closing tag, and text content from the html
    /// 
    public static string CleanUp(string html)
    {
        var output = new StringBuilder();
        using (var reader = XmlReader.Create(new StringReader(html)))
        {
            var settings = new XmlWriterSettings() { Indent = true, OmitXmlDeclaration = true };
            using (var writer = XmlWriter.Create(output, settings))
            {
                while (reader.Read())
                {
                    switch (reader.NodeType)
                    {
                        case XmlNodeType.Element:
                            writer.WriteStartElement(reader.Name);
                            break;
                        case XmlNodeType.Text:
                            writer.WriteString(reader.Value);
                            break;
                        case XmlNodeType.EndElement:
                            writer.WriteFullEndElement();
                            break;
                    }
                }
            }
        }

        return output.ToString();
    }
}

Resulting output :

Convert Docx to html using OpenXml power tools without formatting

Answers (2)

Related Questions