Reputation: 766
I have added the RTF file in comment.Copy the following text in text editor and save as RTF format.
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
FileInputStream inputstream = new FileInputStream(new File("level1Missing.rtf"));
ParseContext pcontext = new ParseContext();
RTFParser rt = new RTFParser();
rt.parse(inputstream, handler, metadata, pcontext);
//getting the content of the document
System.out.println("Contents of the PDF :\n\n" + handler.toString());
Upvotes: 3
Views: 605
Reputation: 1334
In my view, Apache Tika has no problem. The criticality is in the rtf file; there is a \par
less before {\line {\b Level1} : \par}
.
You can try with this another simple file:
{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\par
This is some {\b bold} text.\par
}
If you remove \par
before This is some {\b bold} text.\par
, tika will extract the last chars of the first line.
Upvotes: 4