SQL.injection
SQL.injection

Reputation: 2647

Parquet hive table on s3

i am attempting (unsuccessfully to create a parquet hive table on s3).

create external table sequencefile_s3
(user_id bigint, 
creation_dt string
)
stored as sequencefile location 's3a://bucket/sequencefile';

Sequence file works perfectly.

create external table parquet_s3
(user_id bigint,
creation_dt string)
stored as parquet location 's3a://bucket/parquet';

insert into parquet_s3
select * from hdfs_data;

parquet does not work. The files are created on the S3 bucket/folder, select count(*) works, however, select * from parquet_s3 limit 10 does not work.


other notes I am running a Cloudera distribution 5.8 outside AWS or EC2. The S3a is properly configured (I can copy files through distcp and the s3 sequencefile and textfile external tables work perfectly).

Upvotes: 4

Views: 2271

Answers (1)

Alper t. Turker
Alper t. Turker

Reputation: 35249

First of all, you are not clear about your problem...
what is the problem?
Also, error logs are very important, what output do you get when you run and what command?
All I can say for now is that Hive has its own SEQUENCEFILE reader and SEQUENCEFILE writer libraries for reading and writing through sequence files.
It uses the SEQUENCEFILE input and output formats from these packages:

  • org.apache.hadoop.mapred.SequenceFileInputFormat
  • org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

use below table property statement when you are creating your parquet table and try again

tblproperties ("parquet.compress"="SNAPPY");

Upvotes: 1

Related Questions