Reputation: 123
How do you convert an RTF string to plain text in Java? The obvious answer is to use Swing's RTFEditorKit, and that seems to be the common answer around the Internet. However the write method that claims to return plain text isn't actually implemented... it's hard-coded to just throw an IOException in Java6.
Upvotes: 12
Views: 33605
Reputation: 51
Here is the full code to parse & write RTF as a plain text
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import javax.swing.text.BadLocationException;
import javax.swing.text.Document;
import javax.swing.text.rtf.RTFEditorKit;
public class rtfToJson {
public static void main(String[] args)throws IOException, BadLocationException {
// TODO Auto-generated method stub
RTFEditorKit rtf = new RTFEditorKit();
Document doc = rtf.createDefaultDocument();
FileInputStream fis = new FileInputStream("C:\\SampleINCData.rtf");
InputStreamReader i =new InputStreamReader(fis,"UTF-8");
rtf.read(i,doc,0);
// System.out.println(doc.getText(0,doc.getLength()));
String doc1 = doc.getText(0,doc.getLength());
try{
FileWriter fw=new FileWriter("B:\\Sample INC Data.txt");
fw.write(doc1);
fw.close();
}catch(Exception e)
{
System.out.println(e);
}
System.out.println("Success...");
}
}
Upvotes: 0
Reputation: 2579
You might consider RTF Parser Kit as a lightweight alternative to the Swing RTFEditorKit. The line below shows plain text extraction from an RTF file. The RTF file is read from the input stream, the extracted text is written to the output stream.
new StreamTextConverter().convert(new RtfStreamSource(inputStream), outputStream, "UTF-8");
(full disclosure: I'm the author of RTF Parser Kit)
Upvotes: 2
Reputation: 8560
I use Swing's RTFEditorKit in Java 6 like this:
RTFEditorKit rtfParser = new RTFEditorKit();
Document document = rtfParser.createDefaultDocument();
rtfParser.read(new ByteArrayInputStream(rtfBytes), document, 0);
String text = document.getText(0, document.getLength());
and thats working.
Upvotes: 21
Reputation: 40391
Try Apache Tika: http://tika.apache.org/0.9/formats.html#Rich_Text_Format
Upvotes: 6