USB
USB

Reputation: 6139

How to retrieve data from a specific bucket in hive

I created a table in hive

create table HiveMB 
  (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
   clustered by (Department) into 3 buckets 
   stored as orc TBLPROPERTIES ('transactional'='true') ;

where my file format is like

1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
3,Janet,Sales,60000,A
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C

and the data went into 3 buckets for department.

When I examined the warehouse , there are 3 buckets

Found 3 items
-rwxr-xr-x   3 aibladmin hadoop     252330 2014-11-28 14:46 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000
-rwxr-xr-x   3 aibladmin hadoop     100421 2014-11-28 14:45 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00001
-rwxr-xr-x   3 aibladmin hadoop     313047 2014-11-28 14:46 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00002

How will I be able to retrieve 1 such bucket.

When I did a -cat, It is not in human readable format. showing something like

`J�lj�(��rwNj��[��Y���gR�� \�B�Q_Js)�6 �st�A�6�ixt� R �
ޜ�KT� e����IL Iԋ� ł2�2���I�Y��FC8 /2�g� ����� > ������q�D � b�` `�`���89$ $$ ����I��y|@޿    
                                                                                                %\���� �&�ɢ`a~ � S �$�l�:y���K $�$����X�X��)Ě���U*��
6.  �� �cJnf� KHjr�ć����� ��(p` ��˻_1s  �5ps1:  1:I4L\��u

How can I able to see the data stored into each bucket?

And my file is in csv format not ORC so as a workaround I did this

But I am not able to view data in buckets. That is not in human readable format.

Upvotes: 3

Views: 3319

Answers (4)

ajj
ajj

Reputation: 101

You can see the orc format for a bucket by the command :

hive --orcfiledump [path-to-the-bucket]

Upvotes: 0

Ravi H
Ravi H

Reputation: 596

i am uploading orc screen shot which was produce from this hive queries:

create table stackOverFlow 
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
row format delimited
fields terminated by ',';

load data local inpath '/home/ravi/stack_file.txt'
overwrite into table stackOverFlow;

and

  create table stackOverFlow6
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
   clustered by (Department) into 3 buckets
   row format delimited
   fields terminated by ','
stored as orc tblproperties ("orc.compress"="ZLIB");
 insert overwrite table stackOverFlow6 select * from stackOverFlow;

generated ORC result file for above hive queries: enter image description here

Upvotes: 2

Ravi H
Ravi H

Reputation: 596

your table:

> create table HiveMB 
  (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
   clustered by (Department) into 3 buckets 
   stored as orc TBLPROPERTIES ('transactional'='true') ;

you are chosen table as a ORC format, which means it compresses actual the data and stores the compressed data.

Upvotes: 0

Kishore
Kishore

Reputation: 5881

create table HiveMB1 
  (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
 row format delimited
fields terminated by ',';

load data local inpath '/home/user17/Data/hive.txt'
overwrite into table HiveMB1;


create table HiveMB2
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
   clustered by (Department) into 3 buckets
   row format delimited
   fields terminated by ',';

 insert overwrite table HiveMB2 select * from HiveMB1 ;



user17@BG17:~$ hadoop dfs -ls /user/hive/warehouse/hivemb2
Found 3 items
-rw-r--r--   1 user17 supergroup         22 2014-12-01 15:52 /user/hive/warehouse/hivemb2/000000_0
-rw-r--r--   1 user17 supergroup         44 2014-12-01 15:53 /user/hive/warehouse/hivemb2/000001_0
-rw-r--r--   1 user17 supergroup         43 2014-12-01 15:53 /user/hive/warehouse/hivemb2/000002_0

user17@BG17:~$ hadoop dfs -cat /user/hive/warehouse/hivemb2/000000_0
2,Gokul,Admin,50000,B

user17@BG17:~$ hadoop dfs -cat /user/hive/warehouse/hivemb2/000001_0
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C

user17@BG17:~$ hadoop dfs -cat /user/hive/warehouse/hivemb2/000002_0
1,Anne,Admin,50000,A
3,Janet,Sales,60000,A

Upvotes: 0

Related Questions