Fabio
Fabio

Reputation: 195

How to limit parquet file dimension for a parquet table in hive?

I'm trying to create a parquet table in hive. I can create it but when i run analyze table mytable compute statistics; i get this result:

numfiles=800, numrows=10000000, totalSize=18909876 rawDataSize=40000000

Why the table is made-up of 800 file for only 180 Mb? There is a why to set the number of file? I try with SET parquet.block.size=134217728 but the result is the same

Upvotes: 3

Views: 2523

Answers (2)

Tagar
Tagar

Reputation: 14871

Number of reducers determines number of parquet files.

Check mapred.reduce.tasks parameter.

E.g. you may have a map-reduce job that produces just 100 rows, but if mapred.reduce.tasks is set to 800 (explicitly or implicitly), you'll have 800 parquet files as output (most of parquet files will have only headers and no actual data).

Upvotes: 2

Vineet Srivastava
Vineet Srivastava

Reputation: 23

You also need to set set dfs.blocksize=134217728 along with SET parquet.block.size=134217728 Both the block size should be set while doing hive insert.

Upvotes: 0

Related Questions