Reputation: 61606
I am trying to split an RTF file into lines (in my code) and I am not quite getting it right, mostly because I am not really grokking the entirety of the RTF format. It seems that lines can be split by \par or \pard or \par\pard or any number of fun combinations.
I am looking for a piece of code that splits the file into lines in any language really.
Upvotes: 4
Views: 4011
Reputation: 61606
I coded up a quick and dirty routine and it seems to work for pretty much anything I've been able to throw at it. It's in VB6, but easily translatable into anything else.
Private Function ParseRTFIntoLines(ByVal strSource As String) As Collection
Dim colReturn As Collection
Dim lngPosStart As Long
Dim strLine As String
Dim sSplitters(1 To 4) As String
Dim nIndex As Long
' return collection of lines '
' The lines can be split by the following '
' "\par" '
' "\par " '
' "\par\pard " '
' Add these splitters in order so that we do not miss '
' any possible split combos, for instance, "\par\pard" is added before "\par" '
' because if we look for "\par" first, we will miss "\par\pard" '
sSplitters(1) = "\par \pard"
sSplitters(2) = "\par\pard"
sSplitters(3) = "\par "
sSplitters(4) = "\par"
Set colReturn = New Collection
' We have to find each variation '
' We will look for \par and then evaluate which type of separator is there '
Do
lngPosStart = InStr(1, strSource, "\par", vbTextCompare)
If lngPosStart > 0 Then
strLine = Left$(strSource, lngPosStart - 1)
For nIndex = 1 To 4
If StrComp(sSplitters(nIndex), Mid$(strSource, lngPosStart, Len(sSplitters(nIndex))), vbTextCompare) = 0 Then
' remove the 1st line from strSource '
strSource = Mid$(strSource, lngPosStart + Len(sSplitters(nIndex)))
' add to collection '
colReturn.Add strLine
' get out of here '
Exit For
End If
Next
End If
Loop While lngPosStart > 0
' check to see whether there is a last line '
If Len(strSource) > 0 Then colReturn.Add strSource
Set ParseRTFIntoLines = colReturn
End Function
Upvotes: 1
Reputation: 15118
Have you come across O'Reilly's RTF Pocket Guide, by Sean M. Burke ?
On page 13, it says
Here are some rules of thumb for putting linebreaks in RTF:
Or were you thinking of extracting the plaintext as lines, and doing it whatever the language of the plaintext?
Upvotes: 1
Reputation: 10670
You could try the specification (1.9.1) (see External Links on the Wikipedia page - which also has a couple of links to examples or modules in several programming languages).
That would most likely give you an idea of the line insertion "words", so you can split the file into lines using a well-defined set of rules rather than taking a guess at it.
Upvotes: 1