Reputation: 11085
I'm trying to enabling big data environment which contains Hadoop (2.7), Spark(2.3) and Ceph(luminous).
Before changing fs.s3a.endpoint
to Domain Name, everything worked fine just as expected.
The key part of core-site.xml
is like below:
<property>
<name>fs.defaultFS</name>
<value>s3a://tpcds</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>http://10.1.2.213:8080</value>
</property>
However, when I changed the fs.s3a.endpoint
to Domain Name like below:
<property>
<name>fs.s3a.endpoint</name>
<value>http://gw.gearon.com:8080</value>
</property>
And I tried to launch SparkSQL on the Hadoop Yarn, the error like below throws:
AmazonHttpClient:448 - Unable to execute HTTP request: tpcds.gw.gearon.com: Name or service not known
java.net.UnknownHostException: tpcds.gw.gearon.com: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
The gw.gearon.com
is forwarded to 10.1.2.213
for sure. After googling, I realized one more attribute should be set.
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
<description>Enable S3 path style access ie disabling the default virtual hosting behaviour.
Useful for S3A-compliant storage providers as it removes the need to set up DNS for virtual hosting.
</description>
</property>
After setting fs.s3a.path.style.access
to true
, the error disappears when launching Hadoop Map-Reduce. However, for Spark-SQL
on Hadoop Yarn, the error still exists. I thought maybe Spark overrides Hadoop's settings, so I also append spark.hadoop.fs.s3a.path.style.access true
to spark-defaults.xml
, it still doesn't work.
So here come to the question:
The endpoint
I set is http://gw.gearon.com:8080
, why the error showed me tpcds.gw.gearon.com
is unknown? The tpcds
is my Ceph bucket name I set it as my fs.defaultFS
, it looks fine in core-site.xml
. How can I solve the issue?
Any comment is welcomed and thanks for your help in advance.
Upvotes: 0
Views: 377
Reputation: 101
You should use "amazon naming methods", as described here and here.
That is, point a wildcard dns CNAME to the name of the gateway(s):
*.gw.gearon.com CNAME 10.1.2.213
Also be sure to properly setup that name into the gateways (documentation here):
rgw dns name = clover.voxelgroup.net
Upvotes: 1