user2159301
user2159301

Reputation: 119

Sqoop snappy compression not working

I have the following sqoop script which is supposed to get the data in parquet and use the snappy compression.

sqoop import \ 
--hive-drop-import-delims \
--fields-terminated-by '\001' \
--connect '<Connection URL>' \
--query 'select * from <db_name>.<table_name> where $CONDITIONS' \
--username <username> \
--password <password> \
--split-by '<split-by-key>' \
-m=4 \
--input-null-string '' \
--input-null-non-string '' \
--inline-lob-limit 0 \
--target-dir <hdfs/location/where/files/should/land> \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--as-parquetfile \
--map-column-java NOTES_DETAIL=String,NOTES=String \

Once the script is finished successfully, I go into the hdfs location ['hdfs/location/where/files/should/land'] and see that neither snappy compression is applied nor the _SUCCUSS file showing up. Why is this happening?

This is what I see when I list the files in that folder

21cbd1a6-d58b-4fdc-b332-7433e582ce0b.parquet
3956b0ff-58fd-4a87-b383-4fecc337a72a.parquet
3b42a1a9-4aa7-4668-bdd8-41624dec5ac6.parquet

As you can see no .snappy in file name nor _SUCCESS file.

Upvotes: 0

Views: 2678

Answers (2)

Anoop Velluva
Anoop Velluva

Reputation: 329

Enable compression using below parameter:

-z,--compress

Reference : https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

Upvotes: 0

Uwe L. Korn
Uwe L. Korn

Reputation: 8796

You won't see at the extension of a Parquet file which compression was used. In Parquet files, the data is internally compressed in chunks. With the codec selection, you specify which codec should be used for each chunk in the whole file. Still, the Parquet specification allows you to change the compression codec in each data chunk, thus you could mix the compression codecs inside of a Parquet file. Some tools produce .snappy.parquet files to indicate the chosen compression level but that is only decorative as the compression information is stored in the file's metadata.

To check if your Parquet file has been snappy-compressed, inspect the files using parquet-tools.

Upvotes: 2

Related Questions