Chum-Chum Scarecrows
Chum-Chum Scarecrows

Reputation: 539

NoClassDefFoundError when importing Tika 1.13 in Eclipse

I've done the following steps per the tika guide:

  1. Add the tika-core and tika-parser dependencies to the pom.xml of the maven project
    1. Run maven install from eclipse to produce tika-core jar and tika-parser jar
    2. Add tika-core jar and tika-parser jar to my eclipse project build path

And I get this runtime exception when trying to run tika:

Exception in thread "main" java.lang.NoClassDefFoundError: org.apache.pdfbox.pdmodel.encryption.InvalidPasswordException
    at com.ibm.hrl.ace.pdftotext.TikaExtracter.parse(TikaExtracter.java:33)
    at com.ibm.hrl.ace.pdftotext.Main.AllPdfsToText(Main.java:116)
    at com.ibm.hrl.ace.pdftotext.Main.main(Main.java:34)
Caused by: java.lang.ClassNotFoundException: org.apache.pdfbox.pdmodel.encryption.InvalidPasswordException
    at java.net.URLClassLoader.findClass(URLClassLoader.java:600)
    at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:786)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:760)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:326)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:741)
    ... 3 more

As far as I can see, when I build the jars using maven, it does add pdfbox properly... from the build log:

[INFO] Including org.apache.pdfbox:pdfbox:jar:2.0.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:fontbox:jar:2.0.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:pdfbox-tools:jar:2.0.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:pdfbox-debugger:jar:2.0.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:jempbox:jar:1.8.12 in the shaded jar.

And here are my maven dependencies:

  <dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-core -->
<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-core</artifactId>
    <version>1.13</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-parsers -->
<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parsers</artifactId>
    <version>1.13</version>
</dependency>
  </dependencies>

Upvotes: 0

Views: 2413

Answers (1)

YMomb
YMomb

Reputation: 2387

The problem is that if you manually add tika-core and tika-parsers jars in your build path you will not have the transitive dependencies that are listed in their own POM.

So I would suggest to:

  1. Remove the tika-core and tika-parsers version that you have built yourself. Instead you should rely on the versions that are available on central. This will ensure that another one building your project will get the same jar (and not a locally built one)
  2. You have two options

(Option A, use Maven) Do not add manually into Eclipse build path the jars. Rely either or built-in Maven plugin for Eclipse (m2e for instance) or use Eclipse plugin for maven (call mvn eclipse:eclipse to update .classpath and .project).

(Option B, without Maven) If you cannot use Maven for your project, you will have to add not only tika-parsers and tika-core jars, but all (most of) the transitive dependencies needed by these project (including for instance specific library per format [POI for Office, pdfbox for PDF...). You can get a list of the dependencies by typing mvn dependency:list in the folder containing the pom of tika-parsers.

Upvotes: 4

Related Questions