Ananth Duari
Ananth Duari

Reputation: 2879

Convert Avro in to Parquet format

I want to export data from database and convert in to Avro + Parquet format. Sqoop support Avro export but not Parquet. I try to convert the Avro object to Parquet using Apache Pig, Apache Crunch etc but nothing working out.

Apache pig gives me "Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist". But the input path exist on that location.

Apache Crunch always throw :java.lang.ClassNotFoundException: Class org.apache.crunch.impl.mr.run.CrunchMapper not found" despite I added it in to the Hadoop lib path.

What is the best and easy way to export data from DB in to Parquet format?

Upvotes: 2

Views: 6450

Answers (3)

Ted Dunning
Ted Dunning

Reputation: 1907

The most recent sqoop (1.4.6 I think) supports import to files containing data in Parquet format and also import to Parquet with associated Hive table creation.

Upvotes: 2

Pratik Khadloya
Pratik Khadloya

Reputation: 12869

I was able to dump a mysql table using sqoop1 into an avro file and then convert the avro file into a parquet file using avro2parquet https://github.com/tispratik/avro2parquet conversion tool. Once it was in parquet, i could upload it to hdfs and create a hive table on top of it. You need a parquet plugin in hive if running hive version prior to 0.13. Hive supports parquet natively in 0.13.

Upvotes: 0

Gwen Shapira
Gwen Shapira

Reputation: 5158

I use Hive.

Create an external table on the Avro data. Create an empty Parquet table.

And then insert overwrite table PARQUET_TABLE select * from AVRO_TABLE.

Super easy :)

Upvotes: 3

Related Questions