user2876812
user2876812

Reputation: 336

splitting data into multiple tables in hive

I have a data which contains 100 million rows.I need to split it into tables of 10 million each.How could this be done in hive.

The effective sql query is:

 select * from Customers where rownum>0 and rownum<=1000000  
 select * from Customers where rownum>1000000 and rownum<=2000000  

and so on..

Thanks

Upvotes: 1

Views: 2252

Answers (1)

Joe K
Joe K

Reputation: 18434

Hive has ROW_NUMBER as an analytics function if you're using a recent version.

Otherwise, you could do something like:

from (select Customers.*, rand() as r from Customers) t
insert overwrite table Customers_sample1 select * where r < 0.1
insert overwrite table Customers_sample2 select * where r >=0.1 and r < 0.2
insert overwrite table Customers_sample3 select * where r >=0.2 and r < 0.3
...

Upvotes: 3

Related Questions