Alexander Usoltsev
Alexander Usoltsev

Reputation: 57

Can't have access to the file in Datalab's container from Jupyter python cell

I successfully migrate my data from the deprecated Cloud Datalab Deployer to the docker Datalab container with GCP. I'm using MacOS and I can see that my home directory is connected to container's /content directory. So my notebooks are accessible in the Datalab Jupyter interface.

My notebooks have some text processing routine which loads a stop word list from a text file located just in the same directory as the notebook itself. But when I try to have access to one of the files in /content directory from python code, I receive an error 'File does not exist':

>>> stopwords_full = load_stopwords_from_file('./stopwords/verylong_words_list.txt')

IOError: File ./stopwords/verylong_words_list.txt does not exist

Indeed, when I run ls command I can't see /content directory with all its subfiles etc:

>>> !ls
bin  dev   lib    mnt       proc  sbin  sys    usr
boot     etc   lib64  node_modules  root  srcs  tmp    var
datalab  home  media  opt       run   srv   tools

In the old version of Datalab I didn't have such an error.

How can I work with files in the container's /content directory from the Datalab Jupyter python cells?

Upvotes: 1

Views: 722

Answers (2)

Jose Celaya
Jose Celaya

Reputation: 21

Perhaps one can also run 'git checkout' from a cell. This will in principle put your txt files inside the host vm and you can adjust your path for that.

We had to do all sort of hacks like this on the AppEng deployed version of Datalab, given that the git interface on GPC has been buggy. Not the perfect solution.

Also, seem like gsutil CLI call is available within a cell with %%bash. You can put file on a bucket and then copy inside VM with gsutil

Upvotes: 1

Anthonios Partheniou
Anthonios Partheniou

Reputation: 1709

Adding my comment as the answer so that this doesn't appear as an unanswered question:

When using Datalab on GCP I found that !hostname shows the host name of the datalab gateway. If I use the Datalab local setup, I see the hostname of my local container. Using the Datalab local setup, I can access local files and see the content folder when I run !ls. One potential work around for accessing files when using Datalab on GCP is to use Google Cloud Storage. There is an example at the following link which could be helpful.

Upvotes: 0

Related Questions