Reputation: 48
I am testing Parquet file format and inserting data into Parquet file using Impala external table.
Following is the parameter set that may affect the Parquet file size:
NUM_NODES: 1
PARQUET_COMPRESSION_CODEC: none
PARQUET_FILE_SIZE: 1073741824
I am using following insert statement to write into Parquet file.
INSERT INTO TABLE parquet_test.parquetTable
PARTITION (pkey=X)
SELECT col1, col2, col3 FROM map_impala_poc.textTable where col1%100=X;
I want to generate file size of approximately 1 GB and partitioned data accordingly so that each partition has little less than 1 GB of data in Parquet format. But, this insert operation doesn't generate single file of more than 512 MB. It writes 512 MB of data into one file and then creates another file and writes rest of the data to another file. What can be done to write all the data into single file?
Upvotes: 2
Views: 1335
Reputation: 421
try setting parquet size in the same session you are executing the query
set PARQUET_FILE_SIZE=1g;
INSERT INTO TABLE parquet_test.parquetTable ...
Upvotes: 1