Reputation: 539
I've done the following steps per the tika guide:
And I get this runtime exception when trying to run tika:
Exception in thread "main" java.lang.NoClassDefFoundError: org.apache.pdfbox.pdmodel.encryption.InvalidPasswordException
at com.ibm.hrl.ace.pdftotext.TikaExtracter.parse(TikaExtracter.java:33)
at com.ibm.hrl.ace.pdftotext.Main.AllPdfsToText(Main.java:116)
at com.ibm.hrl.ace.pdftotext.Main.main(Main.java:34)
Caused by: java.lang.ClassNotFoundException: org.apache.pdfbox.pdmodel.encryption.InvalidPasswordException
at java.net.URLClassLoader.findClass(URLClassLoader.java:600)
at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:786)
at java.lang.ClassLoader.loadClass(ClassLoader.java:760)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:326)
at java.lang.ClassLoader.loadClass(ClassLoader.java:741)
... 3 more
As far as I can see, when I build the jars using maven, it does add pdfbox properly... from the build log:
[INFO] Including org.apache.pdfbox:pdfbox:jar:2.0.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:fontbox:jar:2.0.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:pdfbox-tools:jar:2.0.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:pdfbox-debugger:jar:2.0.1 in the shaded jar.
[INFO] Including org.apache.pdfbox:jempbox:jar:1.8.12 in the shaded jar.
And here are my maven dependencies:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-core -->
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.13</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-parsers -->
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.13</version>
</dependency>
</dependencies>
Upvotes: 0
Views: 2413
Reputation: 2387
The problem is that if you manually add tika-core and tika-parsers jars in your build path you will not have the transitive dependencies that are listed in their own POM.
So I would suggest to:
(Option A, use Maven) Do not add manually into Eclipse build path the jars. Rely either or built-in Maven plugin for Eclipse (m2e for instance) or use Eclipse plugin for maven (call mvn eclipse:eclipse to update .classpath and .project).
(Option B, without Maven) If you cannot use Maven for your project, you will have to add not only tika-parsers and tika-core jars, but all (most of) the transitive dependencies needed by these project (including for instance specific library per format [POI for Office, pdfbox for PDF...). You can get a list of the dependencies by typing mvn dependency:list
in the folder containing the pom of tika-parsers.
Upvotes: 4