sag
sag

Reputation: 5451

Reading files via SFTP in Spark

Is it possible to read a file using SFTP in spark?

I tried using val df = sc.textFile("sftp://user:password@host/home/user/sample.csv")

But getting the below error

scala> df.count
java.io.IOException: No FileSystem for scheme: sftp
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)

Is there any way to read a file using sftp in spark?

Upvotes: 2

Views: 6592

Answers (2)

sag
sag

Reputation: 5451

We've created a very simple spark SFTP connector to do that.

Here is the github link https://github.com/springml/spark-sftp

And it has been published to spark-packages as well. http://spark-packages.org/package/springml/spark-sftp

Upvotes: 3

zero323
zero323

Reputation: 330093

It looks like it is not possible at this moment (Spark 1.6, maximum profile hadoop-2.6). SFTP support will be introduced in Hadoop 2.8 (see HADOOP-5732).

Upvotes: 2

Related Questions