Reputation: 33
def get_config_files(self):
dict_path = 'word.pkl'
self.kw_ns = ConfigParser()
self.kw_ns.add_section('Paths')
self.kw_ns.set('Paths','new_df1','gs://'+filepath, encoding='utf-8')
self.kw_ns.set('Paths','dictionary','gs://'+dict_path)
new_df1 = pd.read_csv(self.kw_ns.get('Paths','new_df1'))
dict = pickle.load(open(self.abs_path+self.kw_ns.get('Paths','dictionary'), 'rb'))
I could neither read the csv nor the pickle file since it throws file not found error. I have pandas version 0.25 and gcsfs installed and imported. Any pointers on how can it be accomplished
Upvotes: 0
Views: 4121
Reputation: 13357
With gcsfs
, you need to do a bit of setup, in particular open a File-like object which you can then read or write. Please see the documentation.
import gcsfs
fs = gcsfs.GCSFileSystem(project='my-google-project')
with fs.open('my-bucket/my-file.txt', 'rb') as f:
print(f.read())
Also beware that you may need to authenticate to access the desired project and its storage bucket. And if your program is running in Google Compute Engine (GCE), the GCE VM will need the storage-rw
Scope (or another Scope that implies storage-rw
) and the Service Account will need the Storage Object Admin
Permission.
The more typical ways for a Python program to access Google Cloud Storage (GCS) are:
gs://
pathnames.gsutil
command line invocation to copy a local file to or from GCS. In this case you provide gs://
pathnames. (In Python 3 I'd use the subprocess
built-in library to shell out. In Python 2 I'd use the subprocess32
library installed from PYPI, which is a back-ported version of the same library, with bug fixes.)gcsfuse
, run it to mount a GCS bucket (optionally narrowed to a specific "subdirectory") to a local directory. Then read/write files in that local directory.GCS is really a flat object store, not a file system. E.g. it does not support multiple simultaneous readers and writers to a file; just atomic read or write of a blob.
GCS does not actually have directories, just paths that contain slash characters. With gcsfuse
you can mount the bucket with --implicit-dirs
, in which case it fakes the directories (and runs very slowly), or else you have to have "directory placeholders" (0-length objects with names ending in /
). Without --implicit-dirs
it will create the placeholders during certain operations but won't even see "subdirectories" that don't have them.
Please read the gcsfuse documentation on how its semantics differ from a file system even while gcsfuse
does its best to bridge the gap.
Upvotes: 3