jtownsend
jtownsend

Reputation: 55

Read SHP file from SFTP using pysftp

I am trying to use pysftp's getfo() to read a shapefile (without downloading it). However the output I get does not seem workable and I'm not sure if its possible do this with a shapefile.

Ideally I would like to read in the file and convert it to a Geopandas GeoDataFrame.

import pysftp
import io

with pysftp.Connection(host=host, username=user, password=pass) as sftp:
    print("Connection established ... ")

    flo = io.BytesIO()
    sites = sftp.getfo('sites/Sites.shp', flo)
    value=flo.getvalue()

From here I can't decode the value and am unsure of how to proceed.

Upvotes: 2

Views: 443

Answers (1)

Martin Prikryl
Martin Prikryl

Reputation: 202494

Something like this should do:

flo.seek(0)
df = geopandas.read_file(shp=flo)

Though using the Connection.getfo unnecessarily keeps whole raw file in memory. More efficient would be:

with sftp.open('sites/Sites.shp', bufsize=32768) as f:
    df = geopandas.read_file(f)

(for the purpose of bufsize=32768, see Reading file opened with Python Paramiko SFTPClient.open method is slow)


Though if I understand it correctly, you need multiple files. There's no way the geopandas can magically access other related files on a remote server, when you provide the "shp" via file-like object. Geopandas does not know, where does the "shp" come from or even what is its physical name. You need to provide file-like objects for all individual files. See Using pyshp to read a file-like object from a zipped archive – they do not use Geopandas, but the principle is the same.

For Geopandas, it seems that underlying fiona library handles that and I didn't find any documentation of the relevant parameters.

I guess something like this might do, but that's just a wild guess:

with sftp.open('sites/Sites.shp', bufsize=32768) as shp,
     sftp.open('sites/Sites.shx', bufsize=32768) as shx:
     sftp.open('sites/Sites.dbf', bufsize=32768) as dbf:
     ...
    df = geopandas.read_file(shp, shx=shx, dbf=dbf, ...)

or switch to the shapefile/pyshp module:

with sftp.open('sites/Sites.shp', bufsize=32768) as shp,
     sftp.open('sites/Sites.shx', bufsize=32768) as shx:
     sftp.open('sites/Sites.dbf', bufsize=32768) as dbf:
     ...
    r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)

Another trick is to pack all files to a zip archive:
Read shapefile from HDFS with geopandas


Btw, note that the code downloads the file(s) anyway. You cannot parse a remote file contents, without actually downloading that file contents. The code just avoids storing the downloaded file contents to a (temporary) local file.

Upvotes: 3

Related Questions