Reputation: 91
I am trying to create a EMR cluster with below configurations, but is failing in Bootstrap stage. The EMR release I am using is EMR 5.13.0
[
{
"Classification": "core-site",
"Properties": {
"fs.defaultFS": "s3://my-s3-bucket",
"fs.s3a.imp": "org.apache.hadoop.fs.s3.S3FileSystem"
}
}
]
If I remove this configuration the cluster gets provisioned successfully. Any idea how s3 backed hdfs configuration can be done ?
Upvotes: 2
Views: 1921
Reputation: 1805
In short, what you are trying to achieve is not possible.
Reason: HDFS is an implementation of Hadoop FileSystem API - that is modeled based on POSIX filesystem behavior.
While EMR File System (EMRFS) is an Object Store at the core which mimics HDFS that all Amazon EMR clusters use for reading and writing regular files from Amazon EMR directly to Amazon S3. It still violates some of the requirements of Hadoop FileSystem API to be considered a replacement of HDFS. See "Object Stores vs. Filesystems" section in the above link.
With that said, you can still use Amazon S3 as storage option on EMR without configuring anything by just using URI scheme s3:// .
Hope this answers your question.
Upvotes: 3