Reputation: 385144
If txtLog
is a RichTextBox
control:
Dim text = "hi" & vbCrLf
Debug.WriteLine("t:" & text.Length) ' --> 4, as expected
txtLog.Text = text
Debug.WriteLine("tL:" & txtLog.TextLength) ' --> 3. muh?! :(
Having looked at the RTF spec, the end of a paragraph is notated as \par
, which is neither CR
nor LF
. This makes sense since RTF is markup language; like in HTML, line endings have little meaning on their own.
So presumably, on writing into the RichTextBox
, my line ending is being encoded into \par
. And then, on extraction, the \par
is being translated back to a real line ending for use.
It turns out that this line ending is vbLf
.
Why, since Microsoft near-consistently employ CRLF
for line endings, would RichTextBox
translate \par
to vbLf
instead of vbCrLf
?
Upvotes: 2
Views: 3792
Reputation: 4320
Your interpretation of the spec is incorrect.
RTF spec clearly says:
A carriage return (character value 13) or linefeed (character value 10) will be treated as a \par control if the character is preceded by a backslash. You must include the backslash; otherwise, RTF ignores the control word. (You may also want to insert a carriage-return/linefeed pair without backslashes at least every 255 characters for better text transmission over communication lines.)
This makes RTF an almost format-free language, i.e. RTF content is independent from line breaks (i.e. newline characters are not part of the raw text):
Hi
\par
guys
\par<eof>
is the same as
Hi\par\guys\par<eof>
i.e. your reader must consider all CRs and LFs that have no leading backslash as whitespaces.
Hi
\
guys
\
<eof>
would -if a newline is CR+LF- let the prefixed CR
chars be handled like a \par
token, and all LF
chars be handled as whitespaces (since there is no backslash prefix for the LF present).
So the spec is correct, and precise.
Got it? ;)
(<eof>
denotes an end-of-file character here, or the end of the file, whatever your text editor spits out, and a newline is CR, CR LF, or LF, whatever your texteditor spits out :))
Why, since Microsoft near-consistently employ CRLF for line endings, would RichTextBox translate \par to vbLf instead of vbCrLf?
Only on Windows newlines are CRLF. On other platforms/in some apps, it is LF only. There is no platform using CR only as the newline character. There are platforms, though, that handle CR and LF equally, i.e. CRLF are TWO newlines there. On others, a CR is ignored if followed immediately by LF (this includes Windows apps, usually.)
The behavior you see is the only way to make sure the text result produces the same number of newlines on practically all platforms.
(Of course, this is also application-specific...I´d call this one of the lesser-known compatibility nightmares, that newline mess.)
Upvotes: 1
Reputation: 3374
The immediate reason RichTextBox is implemented this way is because the RTF specification denotes that a carriage return (by itself) or a linefeed by itself is equivalent to \par
.
. . . A carriage return (character value 13) or linefeed (character value 10) will be treated as a \par control . . .
As to why Microsoft would make the specification like this, I don't know for sure. However I would speculate that it had to do with the fact that the first version of RTF was developed for the Mac version of Microsoft Office in the 1980s. I would guess that they developed this par rule so that it worked well on a Mac or worked well as a cross platform format in general. If this is the case, then Microsoft would probably be very hesitant to revise the spec in the coming years ('90s, '00s, etc.) to match standard Windows line endings (since in general Microsoft has a history of trying to support backwards compatibility as much as possible for things like this).
Upvotes: 3