Reputation: 16392
I'm attempting to get the newest version of parquet-tools running, but I'm having some issues. For some reason org.apache.hadoop.conf.Configuration
isn't in the shaded jar. (I have the same issue with v1.6.0 as well).
Is there something beyond mvn package
or mvn install
that I should be doing? (The actual mvn
invocation I'm using is mvn install -DskipTests -pl \!parquet-thrift,\!parquet-cascading,\!parquet-pig-bundle,\!parquet-pig,\!parquet-scrooge,\!parquet-hive,\!parquet-protobuf
). This works just fine, and the tests pass if I choose to run them.
The error I get is below (You can see I've attempted to stick the hadoop jar from an old parquet version that seemed to bundle it into the classpath; I get the same results with or without it).
> java -classpath /path/to/hadoop-core-1.1.0.jar -jar parquet-tools-1.7.0-incubating-SNAPSHOT.jar meta --debug part-r-00000.gz.parquet
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at parquet.tools.command.ShowMetaCommand.execute(ShowMetaCommand.java:59)
at parquet.tools.Main.main(Main.java:222)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more
org/apache/hadoop/conf/Configuration
Upvotes: 4
Views: 15734
Reputation: 2803
On MacOS using homebrew, this is the easiest way to get started:
$ brew install parquet-tools
Upvotes: 13
Reputation: 384
You can also include hadoop dependencies into the target jar:
mvn clean package -Plocal -DskipTests -Dhadoop.scope=compile
Upvotes: 6
Reputation: 7996
If you have hadoop installed, change your command to be hadoop jar parquet-tools-1.7.0-incubating-SNAPSHOT.jar meta --debug part-r-00000.gz.parquet
instead.
Upvotes: 3
Reputation: 6154
This set of steps from the parquet-mr issues list fixed the same issue for me:
mvn install
cd parquet-tools
mvn clean package -Plocal
mvn install
mvn dependency:copy-dependencies
# replace 1.8.2 in the next step with the version you're using
cp target/parquet-tools-1.8.2-SNAPSHOT.jar target/dependency/
mkdir -p ~/local/bin/lib
cp target/dependency/* ~/local/bin/lib/
cp src/main/scripts/* ~/local/bin/
echo export PATH=$PATH:~/local/bin >> .profile
Upvotes: 1
Reputation: 10687
I ran into a similar issue and fixed it by specifying the "local" profile:
mvn clean package -Plocal
I had originally missed this paragraph, but it's explained that if you want to mix in Hadoop dependencies, the "local" profile does so, as opposed to the default where you're expected to use it somewhere Hadoop is already installed and present on your classpath:
https://github.com/Parquet/parquet-mr/tree/master/parquet-tools
Upvotes: 1