Isaac
Isaac

Reputation: 16392

Unable to get parquet-tools working from the command-line

I'm attempting to get the newest version of parquet-tools running, but I'm having some issues. For some reason org.apache.hadoop.conf.Configuration isn't in the shaded jar. (I have the same issue with v1.6.0 as well).

Is there something beyond mvn package or mvn install that I should be doing? (The actual mvn invocation I'm using is mvn install -DskipTests -pl \!parquet-thrift,\!parquet-cascading,\!parquet-pig-bundle,\!parquet-pig,\!parquet-scrooge,\!parquet-hive,\!parquet-protobuf). This works just fine, and the tests pass if I choose to run them.

The error I get is below (You can see I've attempted to stick the hadoop jar from an old parquet version that seemed to bundle it into the classpath; I get the same results with or without it).

> java -classpath /path/to/hadoop-core-1.1.0.jar -jar parquet-tools-1.7.0-incubating-SNAPSHOT.jar meta --debug part-r-00000.gz.parquet

java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
    at parquet.tools.command.ShowMetaCommand.execute(ShowMetaCommand.java:59)
    at parquet.tools.Main.main(Main.java:222)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 2 more
org/apache/hadoop/conf/Configuration

Upvotes: 4

Views: 15734

Answers (5)

Jan Kronquist
Jan Kronquist

Reputation: 2803

On MacOS using homebrew, this is the easiest way to get started:

$ brew install parquet-tools

Upvotes: 13

buryat
buryat

Reputation: 384

You can also include hadoop dependencies into the target jar:

mvn clean package -Plocal -DskipTests -Dhadoop.scope=compile

Upvotes: 6

jbrown
jbrown

Reputation: 7996

If you have hadoop installed, change your command to be hadoop jar parquet-tools-1.7.0-incubating-SNAPSHOT.jar meta --debug part-r-00000.gz.parquet instead.

Upvotes: 3

Tristan Reid
Tristan Reid

Reputation: 6154

This set of steps from the parquet-mr issues list fixed the same issue for me:

mvn install
cd parquet-tools
mvn clean package -Plocal
mvn install
mvn dependency:copy-dependencies
# replace 1.8.2 in the next step with the version you're using
cp target/parquet-tools-1.8.2-SNAPSHOT.jar target/dependency/
mkdir -p ~/local/bin/lib
cp target/dependency/* ~/local/bin/lib/
cp src/main/scripts/* ~/local/bin/
echo export PATH=$PATH:~/local/bin >> .profile

Upvotes: 1

Dennis Huo
Dennis Huo

Reputation: 10687

I ran into a similar issue and fixed it by specifying the "local" profile:

mvn clean package -Plocal

I had originally missed this paragraph, but it's explained that if you want to mix in Hadoop dependencies, the "local" profile does so, as opposed to the default where you're expected to use it somewhere Hadoop is already installed and present on your classpath:

https://github.com/Parquet/parquet-mr/tree/master/parquet-tools

Upvotes: 1

Related Questions