Reputation: 6791
In our largest ML modeling pipeline notebook we need to delete a single input (code) cell (containing sensitive information which we cannot pass via other means when automating its execution).
The cell has been created (injected) by papermill.execute_notebook()
executed in another notebook (controller) and has been auto-tagged with injected-parameters
tag.
The solution (possibly not the only one?) is deleting the cell as soon as it gets executed.
If searching for a tag makes it extra difficult, than let's use solutions for just deleting the previous input cell (programmatically).
What did not work
Hiding the input cell is not good enough, as it would still get saved to the disk (this includes the report_only
option in papermill's execute_notebook()
). Also "converting" with nbconvert
to HTML (which does allow to select cells for removal on the basis of their tags, as in this solution) would still preserve the original notebook with the encoded password inside.
Upvotes: 1
Views: 2284
Reputation: 1413
I'm assuming you want to remove the cell cause it contains sensitive information (a password).
My first advice would be not to pass sensitive information in plain text. A (slightly) better and simple option would be to store the password in a environment variable, and read it from the notebook using os.environ
.
Here's how to remove the cell:
Note: I wrote this code on the fly and didn't test it, might need small edits.
import nbformat
nb = nbformat.read('/path/to/your/notebook.ipynb')
index = None
# find index for the cell with the injected params
for i, c in enumerate(nb.cells):
cell_tags = c.metadata.get('tags')
if cell_tags:
if 'injected-parameters' in cell_tags:
index = i
# remove cell
if index is not None:
nb.cells.pop(index)
# save modified notebook
nbformat.write(nb, '/path/to/your/notebook.ipynb')
Upvotes: 1
Reputation: 6791
An answer seems to involve nbformat
and it already exist on this site, but to a question asked in a such a technical language, that I think it is worth simplifying that question to my plain English version to help / allow others to discover it (I duly upvoted the other answer).
def perform_post_exec_cleanup(output_nb_name, tag_to_del='injected-parameters'):
import json
from traitlets.config import Config
from nbconvert import NotebookExporter
import nbformat
c = Config()
c.TagRemovePreprocessor.enabled=True # to enable the preprocessor
c.TagRemovePreprocessor.remove_cell_tags = [tag_to_del]
c.preprocessors = ['TagRemovePreprocessor'] # previously: c.NotebookExporter.preprocessors
nb_body, resources = NotebookExporter(config=c).from_filename(output_nb_name)
nbformat.write(nbformat.from_dict(json.loads(nb_body)), output_nb_name, 4)
Caveats
It is normally possible to do such notebook conversions / cell stripping in-place in the same notebook in which the stripping code is run. NOT in case of papermill - it will NOT work from within the output notebook, when its code execution is controlled using papermill's execute_notebook()
function. It has to be run in the external (controller) notebook, after the function has finished or interrupted its execution. Because the output notebook has been incrementally saved to the disk during the process, if you want to make sure the injected-parameters
cell does not get saved permantently, you need to run the above stripping code unconditionally, even if the papermill function failed, so put it in your finally
section of try-except-finally
.
[ based on: Run preprocessor using nbconvert as a library ]
Upvotes: 0