Problems parsing a table inside an RTF file using Apache Tika

Question

I'm trying to parse a RTF file using Apache Tika. Inside the file there is a table with several columns.

The problem is that the parser writes out the result without any information in which column the value was.

What I'm doing right now is:

AutoDetectParser adp = new AutoDetectParser(tc);
Metadata metadata = new Metadata();
String mimeType = new Tika().detect(file);
metadata.set(Metadata.CONTENT_TYPE, mimeType);
BodyContentHandler handler = new BodyContentHandler();

InputStream fis = new FileInputStream(file);

adp.parse(fis, handler, metadata, new ParseContext());

fis.close();
System.out.println(handler.toString());

It works but I need to know like meta-information.

Is there already a Handler which outputs something like HTML with a structure of the read RTF file?

Problems parsing a table inside an RTF file using Apache Tika

Answers (1)

Related Questions