Extract fields from ODT document using Java library

Question

I need to use a Java library - or code - to extract field tags from the content of an ODT document. I know odt is some sort of zipped file and it has its contents ina a content.xml file. Of course I could just extract the files, open content.xml and parse it, but I believe some higher level code exists. Just as an example, the content looks like this:

Hi ${name}!    

$nome

I would like to extract the fields as ${name} and $nome.

I know Apache Tika could be used for that, but I haven't spotted an example that actually shows field extraction. I believe this is because the fields I am using are unstructured text instead of input field tags.

Thanks in advance, Daniel

dannyxyz22 · Accepted Answer

Well, just in case anyone is interested, we ended up using Apache Tika for obtaining the content from the odt and we have parsed it using the following regular expression:

\$\{[\w\-\.]*\}

Extract fields from ODT document using Java library

Answers (1)

Related Questions