Srihari Karanth
Srihari Karanth

Reputation: 2167

Performance of AWS EMR over S3 compared to Server with harddisk storage

We have around 10 TB of data from the customer which have to load and query using hive and create aggregation tables which again has to be queried multiple times.

I am planning to use AWS S3 to store 10 TB data in one bucket and query the data using EMR.

Is it a feasible approach or the performance will be poor?

What alternatives can be used to speed up the query?

Upvotes: 1

Views: 629

Answers (1)

jarmod
jarmod

Reputation: 78573

Yes, it's feasible. This is a very common use case (to use S3 vs. hydrating HDFS). The challenge with providing a definitive statement on performance is that "it depends". I think the performance per dollar is undeniably better with S3 but the straight-up performance, depending on how you organize data and what your interaction with that data looks like, is likely to be better with data locally (as you'd expect).

Here are some related articles on this topic:

Things to consider when optimizing access to data in S3:

Upvotes: 3

Related Questions