Reputation: 6139
I created a table in hive
create table HiveMB
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
clustered by (Department) into 3 buckets
stored as orc TBLPROPERTIES ('transactional'='true') ;
where my file format is like
1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
3,Janet,Sales,60000,A
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C
and the data went into 3 buckets for department.
When I examined the warehouse , there are 3 buckets
Found 3 items
-rwxr-xr-x 3 aibladmin hadoop 252330 2014-11-28 14:46 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000
-rwxr-xr-x 3 aibladmin hadoop 100421 2014-11-28 14:45 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00001
-rwxr-xr-x 3 aibladmin hadoop 313047 2014-11-28 14:46 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00002
How will I be able to retrieve 1 such bucket.
When I did a -cat
, It is not in human readable format.
showing something like
`J�lj�(��rwNj��[��Y���gR�� \�B�Q_Js)�6 �st�A�6�ixt� R �
ޜ�KT� e����IL Iԋ� ł2�2���I�Y��FC8 /2�g� ����� > ������q�D � b�` `�`���89$ $$ ����I��y|@
%\���� �&�ɢ`a~ � S �$�l�:y���K $�$����X�X��)Ě���U*��
6. �� �cJnf� KHjr�ć����� ��(p` ��˻_1s �5ps1: 1:I4L\��u
How can I able to see the data stored into each bucket?
And my file is in csv format not ORC so as a workaround I did this
But I am not able to view data in buckets. That is not in human readable format.
Upvotes: 3
Views: 3319
Reputation: 101
You can see the orc format for a bucket by the command :
hive --orcfiledump [path-to-the-bucket]
Upvotes: 0
Reputation: 596
i am uploading orc screen shot which was produce from this hive queries:
create table stackOverFlow
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
row format delimited
fields terminated by ',';
load data local inpath '/home/ravi/stack_file.txt'
overwrite into table stackOverFlow;
and
create table stackOverFlow6
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
clustered by (Department) into 3 buckets
row format delimited
fields terminated by ','
stored as orc tblproperties ("orc.compress"="ZLIB");
insert overwrite table stackOverFlow6 select * from stackOverFlow;
generated ORC result file for above hive queries:
Upvotes: 2
Reputation: 596
your table:
> create table HiveMB
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
clustered by (Department) into 3 buckets
stored as orc TBLPROPERTIES ('transactional'='true') ;
you are chosen table as a ORC
format, which means it compresses actual the data and stores the compressed data.
Upvotes: 0
Reputation: 5881
create table HiveMB1
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
row format delimited
fields terminated by ',';
load data local inpath '/home/user17/Data/hive.txt'
overwrite into table HiveMB1;
create table HiveMB2
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
clustered by (Department) into 3 buckets
row format delimited
fields terminated by ',';
insert overwrite table HiveMB2 select * from HiveMB1 ;
user17@BG17:~$ hadoop dfs -ls /user/hive/warehouse/hivemb2
Found 3 items
-rw-r--r-- 1 user17 supergroup 22 2014-12-01 15:52 /user/hive/warehouse/hivemb2/000000_0
-rw-r--r-- 1 user17 supergroup 44 2014-12-01 15:53 /user/hive/warehouse/hivemb2/000001_0
-rw-r--r-- 1 user17 supergroup 43 2014-12-01 15:53 /user/hive/warehouse/hivemb2/000002_0
user17@BG17:~$ hadoop dfs -cat /user/hive/warehouse/hivemb2/000000_0
2,Gokul,Admin,50000,B
user17@BG17:~$ hadoop dfs -cat /user/hive/warehouse/hivemb2/000001_0
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C
user17@BG17:~$ hadoop dfs -cat /user/hive/warehouse/hivemb2/000002_0
1,Anne,Admin,50000,A
3,Janet,Sales,60000,A
Upvotes: 0