Barath
Barath

Reputation: 107

PDF input format for Mapreduce Hadoop

Hi I anm using PDFBOX external library for parsing the pdf input file in mapreduce,but i am getting the following error.

Error: java.lang.ClassNotFoundException: org.apache.pdfbox.pdmodel.PDDocument at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at com.nielsen.grfe.processor.mapreduce.Pdfparser$PdfLineRecordReader.initialize(Pdfparser.java:109) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

I am using the following dependency

<dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>1.8.10</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>fontbox</artifactId>
        <version>1.8.5</version>
    </dependency>

Upvotes: 0

Views: 1115

Answers (1)

prashant khunt
prashant khunt

Reputation: 154

1) Place the jar file of pdfbox in hadoop lib folder too.(make library jar available to hadoop at runtime).

2) Restart hadoop cluster.

Or

1) Make sure that your pdfbox library is available to hadoop by placing it in distributed cache.

Upvotes: 0

Related Questions