mirekphd
mirekphd

Reputation: 6791

How to delete a Jupyter Notebook input cell programmatically, using its tag?

In our largest ML modeling pipeline notebook we need to delete a single input (code) cell (containing sensitive information which we cannot pass via other means when automating its execution).

The cell has been created (injected) by papermill.execute_notebook() executed in another notebook (controller) and has been auto-tagged with injected-parameters tag.

The solution (possibly not the only one?) is deleting the cell as soon as it gets executed.

If searching for a tag makes it extra difficult, than let's use solutions for just deleting the previous input cell (programmatically).

What did not work

Hiding the input cell is not good enough, as it would still get saved to the disk (this includes the report_only option in papermill's execute_notebook()). Also "converting" with nbconvert to HTML (which does allow to select cells for removal on the basis of their tags, as in this solution) would still preserve the original notebook with the encoded password inside.

Upvotes: 1

Views: 2284

Answers (2)

Eduardo
Eduardo

Reputation: 1413

I'm assuming you want to remove the cell cause it contains sensitive information (a password).

My first advice would be not to pass sensitive information in plain text. A (slightly) better and simple option would be to store the password in a environment variable, and read it from the notebook using os.environ.

Here's how to remove the cell:

Note: I wrote this code on the fly and didn't test it, might need small edits.

import nbformat

nb = nbformat.read('/path/to/your/notebook.ipynb')

index = None

# find index for the cell with the injected params
for i, c in enumerate(nb.cells):
    cell_tags = c.metadata.get('tags')
    if cell_tags:
        if 'injected-parameters' in cell_tags:
            index = i

# remove cell
if index is not None:
    nb.cells.pop(index)

# save modified notebook
nbformat.write(nb,  '/path/to/your/notebook.ipynb')

Upvotes: 1

mirekphd
mirekphd

Reputation: 6791

An answer seems to involve nbformat and it already exist on this site, but to a question asked in a such a technical language, that I think it is worth simplifying that question to my plain English version to help / allow others to discover it (I duly upvoted the other answer).

def perform_post_exec_cleanup(output_nb_name, tag_to_del='injected-parameters'):

    import json
    from traitlets.config import Config
    from nbconvert import NotebookExporter
    import nbformat

    c = Config()
    c.TagRemovePreprocessor.enabled=True # to enable the preprocessor
    c.TagRemovePreprocessor.remove_cell_tags = [tag_to_del]
    c.preprocessors = ['TagRemovePreprocessor'] # previously: c.NotebookExporter.preprocessors

    nb_body, resources = NotebookExporter(config=c).from_filename(output_nb_name)
    nbformat.write(nbformat.from_dict(json.loads(nb_body)), output_nb_name, 4)

Caveats

It is normally possible to do such notebook conversions / cell stripping in-place in the same notebook in which the stripping code is run. NOT in case of papermill - it will NOT work from within the output notebook, when its code execution is controlled using papermill's execute_notebook() function. It has to be run in the external (controller) notebook, after the function has finished or interrupted its execution. Because the output notebook has been incrementally saved to the disk during the process, if you want to make sure the injected-parameters cell does not get saved permantently, you need to run the above stripping code unconditionally, even if the papermill function failed, so put it in your finally section of try-except-finally.

[ based on: Run preprocessor using nbconvert as a library ]

Upvotes: 0

Related Questions