Reputation: 99
Whenever I use Hive CLI and do some query, instead of being able to view the table, I just get an "OK" or error. I understand that would probably be because many tables are extremely large but if I just want to test my code/logic on a subset of the data, how can I view the entire table results to ensure correctness?
hive> select * from input;
OK
Time taken: 0.085 seconds
Upvotes: 2
Views: 6213
Reputation: 3897
My best guess is your table has no data behind it. Did you create a table and forget to put your files in the proper hdfs directory. Do a:
hive> describe formatted my_table;
Then take a look in the HDFS file location given from above:
hive> !hadoop fs -ls /location/obtained/from/describe/command
You should see your files. If not, then make sure you place the files in that directory and try querying again. do a -put or -cp to move the files to that location if not.
Testing Code
If you have some simple expression you want to test you can do the following: To create a dual like table in hive where there is one column and one row you can do the following:
create table dual (x int);
insert into table dual select count(*)+1 as x from dual;
Test an expression on this table like you would in SQL:
select split('3,2,1','\\,') as my_new_array from dual;
Beyond this it is a good idea to test your results on a subset of data as you have mentioned. Then you can print your data into a text file or easily into an excel file or some other format you might prefer after you have done whatever additional transformations you want and review the results:
--grab a subset of the table
CREATE TABLE my_table_subset like my_table;
Insert overwrite table my_table_subset
select * from my_table
tablesample (1 PERCENT) t;
If you don't like a random subset you will have to build a query to target a subset you prefer. Then print it to a file format you like as explained above:
hive -e "select * from my_table_subset limit 1000" > /localfileystem/path/myexcel.xls;
Excel as a file viewer may have limitations so something else may be preferable... This presents an problem when the data gets really large. You may need software like ultraedit or something else. Good luck! Hope this helps.
Upvotes: 1
Reputation: 3845
From what I understand, your tables don't have any data and that's why it is not coming. In general a 'select *' would display data no matter what the size is. 'select *' command is basically equivalent to a 'cat' command and has nothing to do with the size of your tables.
If you want to work on a subset of your data, best thing would be to create a partition. If your data is stored in a way that partitions are not possible then I would recommend creating a temp table with some 1000-2000 rows and try out your queries on that.
Upvotes: 0