Reputation: 5
I have written a Jupyter Notebook within AWS Sagemaker Studio Classic that
utils
, complete with an __init__.py
)Simplified, I have the following folder structure:
root/
├─ conversion_scores/
│ ├─ data_preparation.ipynb
│ ├─ utils/
│ │ ├─ SnowflakeSetup.py
│ ├─ data/
│ │ ├─ testsub/
│ │ │ ├─ to_predict/
│ │ │ │ ├─ output_file.csv
Within data_preparation.ipynb
, I do two things that are relevant here
SnowflakeSetup.py
:import os
import sys
sys.path.append(os.path.join(os.getcwd(), "utils"))
import SnowflakeSetup
output_file.csv
file to the data folderpathname = os.path.join(os.getcwd(), "data", "testsub", "to_predict", f"{end_date.strftime('%Y-%m-%d')}_prediction_run_data.csv")
df.to_csv(pathname, index=False)
When I run this Jupyter notebook manually, it works fine, as I work within root/
, and thus e.g. sys.path.append(os.path.join(os.getcwd(), "utils"))
gives back the correct path /root/chm-conversion-scores/util
.
However, I would like to have this notebook run on a schedule, and then it apparently does not operate in the same folder. The sys.path.append(os.path.join(os.getcwd(), "utils"))
line then gives back /opt/ml/input/data/sagemaker_headless_execution/utils
, where naturally the script cannot be found, leading to ModuleNotFoundError[0m: No module named 'SnowflakeSetup'
I assume the same problem would occur in the writing step of the .csv-file into a local subfolder, although I never got so far.
I tried to hardcode the path as /root/chm-conversion-scores/util
, but this does not work either. I also tried using
project_root = os.getenv('PROJECT_ROOT', os.getcwd())
sys.path.append(os.path.join(project_root, "utils"))
import SnowflakeSetup
and
from pathlib import Path
project_root = Path('/root/chm-conversion-scores')
sys.path.append(str(project_root / "utils"))
import SnowflakeSetup
both to no avail.
How can I read from and write to the local folder structure even in an automated job?
Upvotes: 0
Views: 226
Reputation: 1152
Notebook jobs run on ephemeral instances, similar to running a SageMaker training job. So, it does not have access to the local EFS or the rest of the files in your directory, only the .ipynb file. You can refer to the Additional (file or folder) dependencies under Custom options here.
You have two options for the input file -
In either cases though, you won't be able to write the output csv file to local storage (EFS). It will be loaded to S3, and you can download the file from S3.
Upvotes: 0