uzair_syed
uzair_syed

Reputation: 313

Run Tika source code from Eclipse

I have been using Apache Tika for extracting text from different document formats. Now i want to make it handle header, footer and text boxes differently. So i downloaded source code of Tika from GitHub and trying to make changes in it.

I want to run Apache Tika source code from Eclipse and debug its execution by passing an input document. How can i do that? There are so many main classes. Where do i start? I understand its a Maven project and i am new to it.

And once i make changes how can i create new jar file?

Upvotes: 0

Views: 175

Answers (1)

Konstantin Gribov
Konstantin Gribov

Reputation: 670

Take a look at Tika's xhtml output first, maybe it extracts headers/footers and you can use parser API to handle these parts as you wish. If it's that way, use API as examples say passing custom SAX-like handler to it.

Upvotes: 1

Related Questions