Reputation: 6566
I downloaded and built parquet-1.5.0 of https://github.com/apache/parquet-mr.
I now want to run some commands on my parquet files that are in hdfs. I tried this:
cd ~/parquet-mr/parquet-tools/src/main/scripts
./parquet-tools meta hdfs://localhost/my_parquet_file.parquet
and I got:
Error: Could not find or load main class parquet.tools.Main
Upvotes: 0
Views: 12909
Reputation: 1525
Download jar Download the jar from maven repo, or any location of your choice. Just google it. The time of this post I can get the parquet-tools from here.
If you’re logged in the hadoop box:
wget http://central.maven.org/maven2/org/apache/parquet/parquet-tools/1.9.0/parquet-tools-1.9.0.jar
This link might stop working few days later. So get the new link from maven repo.
Build jar If you are unable to download the jar, you could also build the jar from source. Clone the parquet-mr repo and build the jar from the source
git clone https://github.com/apache/parquet-mr
mvn clean package
Note: you need maven on your box to build the source.
Read parquet file You can use these commands to view the contents of the parquet file-
Check schema for s3/hdfs file:
hadoop jar parquet-tools-1.9.0.jar schema s3://path/to/file.snappy.parquet
hadoop jar parquet-tools-1.9.0.jar schema hdfs://path/to/file.snappy.parquet
Head file contents:
hadoop jar parquet-tools-1.9.0.jar head -n5 s3://path/to/file.snappy.parquet
Check contents of local file:
java -jar parquet-tools-1.9.0.jar head -n5 /tmp/path/to/file.snappy.parquet
java -jar parquet-tools-1.9.0.jar schema /tmp/path/to/file.snappy.parquet
More commands:
hadoop jar parquet-tools-1.9.0.jar –help
Upvotes: 3
Reputation: 3105
The script is built on the assumption that parquet-tools-<version>.jar
is located in a directory called lib
next to the script file itself, like so:
$ find -type f
./parquet-tools
./lib/parquet-tools-1.10.1-SNAPSHOT.jar
You can set up such a file layout by issuing the following commands from the root of the parquet-mr git repo (of course many alternative ways and installation locations are possible):
mkdir -p ~/.local/share/parquet-tools/lib
cp parquet-tools/src/main/scripts/parquet-tools ~/.local/share/parquet-tools/
cp parquet-tools/target/parquet-tools-1.5.0.jar ~/.local/share/parquet-tools/lib
After this you can run ~/.local/share/parquet-tools/parquet-tools
. (I tested this with version 1.10.1-SNAPSHOT though instead of 1.5.0.)
Upvotes: 0