Smallgods
Smallgods

Reputation: 237

Convert restricted RTF subset to plaintext with HTML formatting tags

I'm looking to take the output of a WPF RichTextBox which is locked down to only allow certain formatting commands (Bold, Underlined and Italic), and parse it to be plaintext with HTML tags denoting the formatting. This is so that the formatting information can be picked up and parsed by an Oracle Publishing interface.

All other information such as font sizes, colors etc are not important, as they will be handled the Publishing template further down the line.

Ideally then we would have something like the following, stripping out all other rtf tags:

This is <b>some bold text, with <i>this bit</i> italic as well</b>

Is there a relatively easy way to do this? I've seen some Regex strings, but they always seem to let unwanted rtf material through. I don't want to use a commercial solution really, as its quite a small problem. Any ideas?

Upvotes: 1

Views: 985

Answers (1)

Athari
Athari

Reputation: 34293

You should parse RTF and replace necessary control codes with HTML tags. Considering complexity of RTF, I don't think Regex will be enough.

Upvotes: 1

Related Questions