Reputation: 658
I implemented the same datawarehouse star schema in hive and mysql. I would dimostrate that Hive is better in terms of query response time.
But I tried with some thousands of records and mysql seems better than hive. I think Hive is Better with millions of record (in order of gigabyte)
The problem is that I havent a million of records and I havent a clustering of hardware for hadoop.
How I can do it?
If I suppose to have a cluster of 10 servers I could divide the time of query execution in hive by 10? thanks you
Upvotes: 0
Views: 2662
Reputation: 20836
Actually, for just millions of record, I think MySQL is better.
Do you really need Hive? How will you use the data? Hive is not suit to real-time analysis. It's for offline analysis. Basically, one SQL query will run at least dozens of seconds in Hive. But for just millions of record, query run in MySQL can return less than 1 second, if your schema is well designed and the indexes are created correctly.
In addition, "If I suppose to have a cluster of 10 servers I could divide the time of query execution in hive by 10", this is wrong. Different queried may have different speed-up ratio. It also depends on the data distribution. In extreme case, Hive may only use one machine to run the query, e.g., cross join.
Upvotes: 1