Reputation: 1199
I provisioned an AWS EMR HBASE
cluster with 1 master
and 1 core node
(m5.xLarge). My cluster doesn't have any 'task' node as I plan to use this cluster only for storage. The hdfs-site.xml
file on both boxes had dfs.replication
set to 1 which makes sense. I then manually added 5 more core
nodes. I was hoping EMR would bump the replication factor from 1 to 2 as per their docs - https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hdfs-config.html
As I understand, EMR will set the replication factor to 2 if I supply 6 cores during bootstrap, but what about in my use case where I manually scaled the cluster up after I was up and running?
Upvotes: 0
Views: 275
Reputation: 191874
Looks like EMR won't do it automatically. After scaling cluster up, I will need to reconfigure the replication factor by manually reconfiguring the instance group - https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html
--instanceGroups.json below -
[
{
"InstanceGroupId":"<ig-1xxxxxxx9>",
"Configurations":[
{
"Classification":"yarn-site",
"Properties":{
"yarn.nodemanager.disk-health-checker.enable":"true",
"yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage":"100.0"
},
"Configurations":[]
}
]
}
]
aws emr modify-instance-groups --cluster-id <j-2AL4XXXXXX5T9>
--instance-groups file://instanceGroups.json
Upvotes: 1