Reputation: 237
I'm looking to take the output of a WPF RichTextBox which is locked down to only allow certain formatting commands (Bold, Underlined and Italic), and parse it to be plaintext with HTML tags denoting the formatting. This is so that the formatting information can be picked up and parsed by an Oracle Publishing interface.
All other information such as font sizes, colors etc are not important, as they will be handled the Publishing template further down the line.
Ideally then we would have something like the following, stripping out all other rtf tags:
This is <b>some bold text, with <i>this bit</i> italic as well</b>
Is there a relatively easy way to do this? I've seen some Regex strings, but they always seem to let unwanted rtf material through. I don't want to use a commercial solution really, as its quite a small problem. Any ideas?
Upvotes: 1
Views: 985
Reputation: 34293
You should parse RTF and replace necessary control codes with HTML tags. Considering complexity of RTF, I don't think Regex will be enough.
Rich Text Format (RTF) Specification, version 1.6. Syntax is relatively easy, you just need to process control codes like \b
for bold etc., I think.
NRTFTree - A class library for RTF processing in C#. Its SAX parser is probably what you need.
Upvotes: 1