Reputation: 397
Is there a python client library that can parse a path, determine whether it is a local or GCS path, and read/write accordingly? e.g. so I can just pass in an arbitrary path and my business logic can be agnostic to the exact storage mechanism?
tensorflow's gfile API is one option, but it would seem like an odd design to import tensorflow just for the sake of using the gfile API.
Upvotes: 2
Views: 658
Reputation: 147
If you want to be agnostic to the storage mechanism, there's PyFilesystem (fs), which abstracts file systems, and the connector for it, fs-gcsfs. Not to be confused with the gcsfs library from the folks who make dask.
They have multiple differences, and one might be more usable than the other in specific circumstances. However, the PyFilesystem one seems to be much more 'portable' in that the interface is unified, and I've had more success with it. As a bonus, if you're careful enough, you can have your app work with things that aren't even actual file systems (zip files, MemoryFS, etc.)
Upvotes: 2
Reputation: 148
You should be able to achieve this by using schema
from urlparse to check the URL - gsutil uses gs://
to denote Cloud Storage, so you can use some logic from there to determine what processing happens to the file.
Alternatively, publicly stored GCS files begin https://storage.googleapis.com
which can be compared from netloc
Upvotes: 0