Reputation: 57
I successfully migrate my data from the deprecated Cloud Datalab Deployer to the docker Datalab container with GCP. I'm using MacOS and I can see that my home directory is connected to container's /content
directory. So my notebooks are accessible in the Datalab Jupyter interface.
My notebooks have some text processing routine which loads a stop word list from a text file located just in the same directory as the notebook itself.
But when I try to have access to one of the files in /content
directory from python code, I receive an error 'File does not exist':
>>> stopwords_full = load_stopwords_from_file('./stopwords/verylong_words_list.txt')
IOError: File ./stopwords/verylong_words_list.txt does not exist
Indeed, when I run ls
command I can't see /content
directory with all its subfiles etc:
>>> !ls
bin dev lib mnt proc sbin sys usr
boot etc lib64 node_modules root srcs tmp var
datalab home media opt run srv tools
In the old version of Datalab I didn't have such an error.
How can I work with files in the container's /content
directory from the Datalab Jupyter python cells?
Upvotes: 1
Views: 722
Reputation: 21
Perhaps one can also run 'git checkout' from a cell. This will in principle put your txt files inside the host vm and you can adjust your path for that.
We had to do all sort of hacks like this on the AppEng deployed version of Datalab, given that the git interface on GPC has been buggy. Not the perfect solution.
Also, seem like gsutil
CLI call is available within a cell with %%bash
. You can put file on a bucket and then copy inside VM with gsutil
Upvotes: 1
Reputation: 1709
Adding my comment as the answer so that this doesn't appear as an unanswered question:
When using Datalab on GCP I found that !hostname
shows the host name of the datalab gateway. If I use the Datalab local setup, I see the hostname of my local container. Using the Datalab local setup, I can access local files and see the content folder when I run !ls
. One potential work around for accessing files when using Datalab on GCP is to use Google Cloud Storage. There is an example at the following link which could be helpful.
Upvotes: 0