Reputation: 55
I am trying to use pysftp's getfo()
to read a shapefile (without downloading it). However the output I get does not seem workable and I'm not sure if its possible do this with a shapefile.
Ideally I would like to read in the file and convert it to a Geopandas GeoDataFrame.
import pysftp
import io
with pysftp.Connection(host=host, username=user, password=pass) as sftp:
print("Connection established ... ")
flo = io.BytesIO()
sites = sftp.getfo('sites/Sites.shp', flo)
value=flo.getvalue()
From here I can't decode the value and am unsure of how to proceed.
Upvotes: 2
Views: 443
Reputation: 202494
Something like this should do:
flo.seek(0)
df = geopandas.read_file(shp=flo)
Though using the Connection.getfo
unnecessarily keeps whole raw file in memory. More efficient would be:
with sftp.open('sites/Sites.shp', bufsize=32768) as f:
df = geopandas.read_file(f)
(for the purpose of bufsize=32768
, see Reading file opened with Python Paramiko SFTPClient.open method is slow)
Though if I understand it correctly, you need multiple files. There's no way the geopandas can magically access other related files on a remote server, when you provide the "shp" via file-like object. Geopandas does not know, where does the "shp" come from or even what is its physical name. You need to provide file-like objects for all individual files. See Using pyshp to read a file-like object from a zipped archive – they do not use Geopandas, but the principle is the same.
For Geopandas, it seems that underlying fiona library handles that and I didn't find any documentation of the relevant parameters.
I guess something like this might do, but that's just a wild guess:
with sftp.open('sites/Sites.shp', bufsize=32768) as shp,
sftp.open('sites/Sites.shx', bufsize=32768) as shx:
sftp.open('sites/Sites.dbf', bufsize=32768) as dbf:
...
df = geopandas.read_file(shp, shx=shx, dbf=dbf, ...)
or switch to the shapefile
/pyshp
module:
with sftp.open('sites/Sites.shp', bufsize=32768) as shp,
sftp.open('sites/Sites.shx', bufsize=32768) as shx:
sftp.open('sites/Sites.dbf', bufsize=32768) as dbf:
...
r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
Another trick is to pack all files to a zip archive:
Read shapefile from HDFS with geopandas
Btw, note that the code downloads the file(s) anyway. You cannot parse a remote file contents, without actually downloading that file contents. The code just avoids storing the downloaded file contents to a (temporary) local file.
Upvotes: 3