Reputation: 91
I am running a simple join query
select count(*) from t1 join t2 on t1.sno=t2.sno
Table t1 and t2 both have 20 million records each and column sno is of string data type.
The table data is imported in to HDFS from Amazon s3 in rcfile format. The query took 109s with 15 Amazon large instances however it takes 42sec on sql server with 16 GB RAM and 16 cpu cores.
Am I missing anything? Can't understand why am I getting slow performance on Amazon?
Upvotes: 2
Views: 425
Reputation: 8269
Some questions to help you tune Hadoop performance:
sql-server might be fine with 40mm records, but wait till you have 2bn records and see how it does. It will probably just break. I'd see hive more as a clever wrapper for Map Reduce rather than an alternative to a real database.
Also from experience I think having 15 c1.mediums might perform just as well as the large machines, if not better. the large machines don't have the right balance of CPU/Memory honestly.
Upvotes: 2