Reputation: 63
I would like to parse any text and encode it to RTF format, I found a simple solution just to put text in some "basic template".
This works OK until text doesn't contain any special characters.
I need to be able to escape Japanese, Chinese, Russian, Latin special characters... etc.
For example, this:
追伸。次回の発表が気になる場合は、こちらをご確認ください。
should be escaped to this:
\'92\'c7\'90\'4c\'81\'42\'8e\'9f\'89\'f1\'82\'cc\'94\'ad\'95\'5c\'82\'aa\'8b\'43\'82\'c9\'82\'c8\'82\'e9\'8f\'ea\'8d\'87\'82\'cd\'81\'41\'82\'b1\'82\'bf\'82\'e7\'82\'f0\'82\'b2\'8a\'6d\'94\'46\'82\'ad\'82\'be\'82\'b3\'82\'a2\'81\'42\
Is there any library for C# that could handle this, or is there any simple solution how to achieve this?
Upvotes: 1
Views: 3719
Reputation: 13
This is an old topic, but if someone encounters incorrect data truncation due to line break characters, then a couple of extra conditions can easily fix it:
public static string Escape(string s) {
if (s == null)
return s;
var sb = new StringBuilder();
char c;
for (int i = 0; i < s.Length; i++) {
c = s[i];
// \r
if (c == 13)
continue;
// \n
if (c == 10) {
sb.Append("\\line ");
}
else if (c >= 0x20 && c < 0x80) {
if (c == '\\' || c == '{' || c == '}')
sb.Append('\\');
sb.Append(c);
}
else if (c < 0x20 || (c >= 0x80 && c <= 0xFF)) {
sb
.Append('\\')
.Append(((byte) c).ToString("X"));
}
else {
sb
.Append("\\u")
.Append((uint) c)
.Append('?');
}
}
return sb.ToString();
}
Upvotes: 0
Reputation: 21
Thanks for the help!
I was looking at this thread because I needed the Rtf Escape for passing info to Wordpad or any Rtf control, and modified a little bit the code for the C# version, just for some char values greater than 32767 where the cast to short was not enough.
Also the chars between 0x00-0x20 and 0x80-0xFF I needed to force using 2 hexadecimal digits (that's why I needed to use ToString("X2")):
public static string Escape(string source)
{
if (string.IsNullOrEmpty(source)) return string.Empty;
var sb = new StringBuilder();
foreach (char c in source)
{
if (c >= 0x20 && c < 0x80)
{
if (c == '\\' || c == '{' || c == '}')
{
sb.Append('\\');
}
sb.Append(c);
}
else if (c < 0x20 || (c >= 0x80 && c <= 0xFF))
{
sb.Append($"\\'{((byte)c).ToString("X2")}");
}
else
{
sb.Append($"\\u{(int)c}?");
}
}
return sb.ToString();
}
Hope this could help!
Upvotes: 1
Reputation: 31
C# version of Yongtao Wang's answer:
public static string Escape(string s)
{
if (s == null) return s;
var sb = new StringBuilder();
foreach (char c in s)
{
if (c >= 0x20 && c < 0x80)
{
if (c == '\\' || c == '{' || c == '}')
{
sb.Append('\\');
}
sb.Append(c);
}
else if (c < 0x20 || (c >= 0x80 && c <= 0xFF))
{
sb.Append($"\\'{((byte)c).ToString("X")}");
}
else
{
sb.Append($"\\u{(short)c}?");
}
}
return sb.ToString();
}
If the string can contain linebreaks, you will also need to call this method before returning the escaped string:
private static string FixLineBreaks(string str)
{
return str.Replace(@"\'d\'a", @"\line ");
}
Upvotes: 3
Reputation: 39
You need to deal with different character set which is not easy.
First, you need to convert the charcter encoding to the language you need, like GB2312 for Chinese, then convert the char value to hex string.
The easiest way is to convert them to unicode instead which is supported by nowadays RTF readers:
Here is some code in Java and should be easy to convert it to C# public static String escape(String s){ if (s == null) return s;
int len = s.length();
StringBuilder sb = new StringBuilder(len);
for (int i = 0; i < len; i++){
char c = s.charAt(i);
if (c >= 0x20 && c < 0x80){
if (c == '\\' || c == '{' || c == '}'){
sb.append('\\');
}
sb.append(c);
}
else if (c < 0x20 || (c >= 0x80 && c <= 0xFF)){
sb.append("\'");
sb.append(Integer.toHexString(c));
}else{
sb.append("\\u");
sb.append((short)c);
sb.append("??");//two bytes ignored
}
}
return sb.toString();
}
Upvotes: 1