Jon Martin
Jon Martin

Reputation: 3392

Amazon EFS vs S3 for distributed computing

I have a big data problem that I want to distribute over say 20 EC2 instances. My data set is produced locally, and I want to slice it for distribution across all of my EC2 instances. I don't quite understand the differences between block vs file vs object storage, but to me it seems that being able to mount the EFS on all EC2 instances would be more performant than copying data from S3 to individual instances. Is this assumption correct, and if so, is there a way to upload data to EFS without using the DataSync system provided by Amazon?

Upvotes: 1

Views: 3142

Answers (2)

bgdnlp
bgdnlp

Reputation: 1145

S3 is like a web server. You upload files to it and download files from it, but you can't modify a file directly on the server. You have to download it, then modify, then put it back.

EFS, which is NFSv4, is like a disk. You can edit files directly. It's also significantly more expensive than S3. To upload files to EFS, you mount it on an EC2 instance like a normal disk.

That said, it sounds like the correct answer for what you're trying to do is to use EMR, like JD D suggested.

Upvotes: 3

qkhanhpro
qkhanhpro

Reputation: 5220

It depends on your specific use-cases and softwares but here's some basic guideline

  • S3 is object storage. Data on S3 is served over HTTP(s) to your machines
  • EFS is file system storage, using NFSv4 protocol

EFS is much much more expensive than S3 for the purpose of just saving into it and read from it

Here is a comparison already made on Stack AWS EFS vs EBS vs S3 (differences & when to use?)

Upvotes: 2

Related Questions