Reputation: 7576
I'm trying to load some data from an Amazon S3 bucket by:
SparkConf sparkConf = new SparkConf().setAppName("Importer");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
HiveContext sqlContext = new HiveContext(ctx.sc());
DataFrame magento = sqlContext.read().json("https://s3.eu-central-1.amazonaws.com/*/*.json");
This last line however throws an error:
Exception in thread "main" java.io.IOException: No FileSystem for scheme: https
The same line has been working in another project, what am I missing? I'm running Spark on a Hortonworks CentOS VM.
Upvotes: 9
Views: 7844
Reputation: 1596
By default Spark supports HDFS, S3 and local. S3 can be accessed by s3a:// or s3n:// protocols (difference between s3a, s3n and s3 protocols)
So to access a file the best is to use the following:
s3a://bucket-name/key
Depending on your spark version and included libraries you may need to add external jars:
Spark read file from S3 using sc.textFile ("s3n://...)
(Are you sure that you were using s3 with https protocol in previous projects? Maybe you had custom code or jars included to support https protocol?)
Upvotes: 1