Tickets2Moontown
Tickets2Moontown

Reputation: 137

Geopandas - read_file() function VERY slow through shared network. How to run application without reading in files through shared network

My query is regarding the speed of implementing the read_file() or from_file() function in Geopandas.

The program I created needs to read in files of around 900,000 rows. When I specify the path within my own PC, it takes around 3 minutes.

Through the shared drive however, it takes 45+ minutes, not ideal for an application.

I know the VPN used makes the whole process incredibly slow, but how do companies use applications that access huge datasets and it not take an unreasonable amount of time. Is it specific servers?

My end goal is to make the application I made accessible to everyone that uses the shared drive without them having to download the files and alter the code to read them in.

Thanks for your help

Upvotes: 0

Views: 1542

Answers (1)

Pieter
Pieter

Reputation: 1484

Not sure if it will solve your specific problem, but you can always speed up geopandas I/O by installing and using the pyogrio I/O engine to read the data... It is a lot faster in general, not sure if it will give a significant difference for your specific issue though.

Adding the use_arrow=True parameter as well should give another speedup, but then you'll also have to install the pyarrow library:

import geopandas as gpd

gdf = gpd.read_file(r'path/to/file.gpkg', engine='pyogrio', use_arrow=True)

Upvotes: 1

Related Questions