Patterson
Patterson

Reputation: 2757

How to Connect Databricks to SFTP Server with PySpark

Is it possible to connect to an SFTP Server from Databricks? I have looked at previous questions/answers and according to the a SO question here

It would it isn't possible to connect using Spark (at least it wasn't possible over a year ago according to @AlexOtt)

Is this still the case?


Issue while using above code, have a glance

Upvotes: 1

Views: 2381

Answers (1)

JayashankarGS
JayashankarGS

Reputation: 7995

First, install paramiko package in your databricks and follow below steps.

Run below code for connecting to sftp server.

import paramiko

host = "test.rebex.net"
port = 22
username = "demo"
password = "password"

client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(host, port=port, username=username, password=password)
sftp = client.open_sftp()

Then using get function you can the files you want by specifying the path as below.

local_path = "/dbfs/FileStore/tables/rd.txt"
remote_path = "/pub/example/readme.txt"
sftp.get(remote_path, local_path)
spark.read.text("/FileStore/tables/rd.txt").show()

Make sure you mention local path as above, don't use like this dbfs:/FileStore/tables/rd.txt Output:

enter image description here

Then close the connection.

sftp.close()
client.close()

Upvotes: 1

Related Questions