Ilyas
Ilyas

Reputation: 305

Remove html tags from string, and keep text openxml friendly?

I am inserting text into an open XML document. The text I retrieve and insert into the document contains HTML formatting, i.e < p >some text< / p > < p >More text< / p > thus the inserted text inside word gets this as text. Can text with HTML get cast to something open XML documents will understand ?

Upvotes: 0

Views: 1633

Answers (1)

mausworks
mausworks

Reputation: 1625

New answer:

There is actually a project on codeplex that does exactly what you are looking for.

See here the project here:
Html to OpenXml on codeplex

However; if the formatting (headings/paragraphs etc...) are not important you can just strip the HTML-tags entirely.

Here is a tutorial on how to do that:
C# Remove HTML Tags


Old answer (OP worded his question a bit odd, and i misunderstood):

What you need to do is encode your HTML-code somehow; you could use base64 or whatever floats your boat. "Simple" HTML-encoding would probably be the best course of action here.

This way the HTML will not break your XML.

ASP.NET has support for this; but you can do it in any application by importing the required namespace.

Here's an example. HtmlEncode from Class Library

Upvotes: 1

Related Questions