Reputation: 336
I have a data which contains 100 million rows.I need to split it into tables of 10 million each.How could this be done in hive.
The effective sql query is:
select * from Customers where rownum>0 and rownum<=1000000
select * from Customers where rownum>1000000 and rownum<=2000000
and so on..
Thanks
Upvotes: 1
Views: 2252
Reputation: 18434
Hive has ROW_NUMBER
as an analytics function if you're using a recent version.
Otherwise, you could do something like:
from (select Customers.*, rand() as r from Customers) t
insert overwrite table Customers_sample1 select * where r < 0.1
insert overwrite table Customers_sample2 select * where r >=0.1 and r < 0.2
insert overwrite table Customers_sample3 select * where r >=0.2 and r < 0.3
...
Upvotes: 3