JustSid
JustSid

Reputation: 25318

How to get ID of cell in Jupyter kernel?

I'm trying to build a Jupyter kernel for a language that doesn't really support a REPL, and re-defining a variable or function throws an error in that language. Unfortunately that means that I can't just keep executing the code in the order the user submits it, but instead need to substitute it if they re-visit an older cell. Let's say the user has the following two cells:

Cell 1:

int foo = 1;

Cell 2:

vec4(foo);

In my ideal scenario, I just want to stitch the cells together into one virtual source file that is in cell order and then execute that. So the resulting virtual source file should be this:

int foo = 1;
vec4(foo);

Now let's say the user goes back to cell 1 and edits foo to be 4, how can I find out that the user edited cell 1? So ideally I want to update the virtual source file to look like this:

int foo = 4;
vec4(foo);

Instead of this:

int foo = 1;
vec4(foo);
int foo = 4; // This would throw an error in the language compiler

I'm using this as my base and I've looked through the source but have been unable to find anything that would help me. Is there something that I missed? Anything else that I should be doing instead?

Upvotes: 2

Views: 3971

Answers (1)

jgoday
jgoday

Reputation: 2836

Theres goes a possible solution using messaging_api (https://jupyter-client.readthedocs.io/en/latest/messaging.html#history).

import asyncio
import os
from uuid import uuid4
import json
from dataclasses import dataclass

from tornado.escape import json_encode, json_decode, url_escape
from tornado.websocket import websocket_connect
from tornado.httpclient import AsyncHTTPClient, HTTPRequest

client = AsyncHTTPClient()
session_id = 'faf69f76-6667-45d6-a38f-32460e5d7f24'
token = 'e9e267d0c802017c22bc31d276b675b4f5b3e0f180eb5c8b'
kernel_id = 'fad149a5-1f78-4827-ba7c-f1fde844f0b2'

@dataclass
class Cell:
    code: str
    index: int
    execution_count: int

# We keep track of all cells to matain an updated index
cells = []

async def get_sessions():
    url = 'http://localhost:8888/api/sessions?token={}'.format(token)
    req = HTTPRequest(url=url)
    resp = await client.fetch(req)
    print(resp)
    print(resp.body)

async def get_notebook_content(path):
    url = 'http://localhost:8888/api/contents/{}?token={}'.format(path, token)

    req = HTTPRequest(url=url)
    resp = await client.fetch(req)
    return json_decode(resp.body)

async def get_session(session_id):
    ses_url = 'http://localhost:8888/api/sessions/{}?token={}'.format(session_id, token)
    ses_req = HTTPRequest(url=ses_url)
    resp = await client.fetch(ses_req)
    return json_decode(resp.body)

# return the list of notebook cells as Cell @dataclass
def parse_cells(content):
    res = []
    # we iterate over notebook cells
    cells = content['content']['cells']
    # search the cell
    for index, c in enumerate(cells):
        cell_execution_count = c['execution_count']
        code = c['source']
        cell = Cell(code=code, index=index, execution_count=cell_execution_count)

        res.append(cell)

    return res

# listen to all notebook messages
async def listen():
    session_data = await get_session(session_id)
    notebook_path = session_data['notebook']['path']
    notebook_content = await get_notebook_content(notebook_path)

    # parse existing cells
    cells = parse_cells(notebook_content)

    # listen to all messages
    req = HTTPRequest(
        url='ws://localhost:8888/api/kernels/{}/channels?token={}'.format(
            kernel_id,
            token))
    ws = await websocket_connect(req)
    print('Connected to kernel websocket')
    hist_msg_id = None

    while True:
        msg = await ws.read_message()
        msg = json_decode(msg)
        msg_type = msg['msg_type']
        parent_msg_id = msg['parent_header']['msg_id']

        if msg_type == 'execute_input':
            # after a executed cell we request the history (only of the last executed cell)
            hist_msg_id = uuid4().hex
            ws.write_message(json_encode({
                'header': {
                    'username': '',
                    'version': '5.3',
                    'session': '',
                    'msg_id': hist_msg_id,
                    'msg_type': 'history_request'
                },
                'parent_header': {},
                'channel': 'shell',
                'content': {
                    'output': False,
                    'raw': True,
                    'hist_access_type': 'tail',
                    'n': 1
                },
                'metadata': {
                },
                'buffers': {}
            }))
        elif parent_msg_id == hist_msg_id and msg_type == 'history_reply':
            # we receive the history of the last executed cell with his execution_count
            hist_msg_id = None # we dont expect more replies
            # see message type 'history_result': https://jupyter-client.readthedocs.io/en/latest/messaging.html#history
            execution_count = msg['content']['history'][0][1]
            code = msg['content']['history'][0][2]
            # update the existing cell
            for c in cells:
                if c.execution_count + 1 == execution_count:
                    c.code = code
                    c.execution_count = execution_count

                    print('# Cell changed: {}'.format(c))

if __name__ == '__main__':
    asyncio.run(listen())

Let me try to explain it ...

  • We keep track of all notebooks cells and his index on a list (cells) of a Cell dataclass (code, index and execution_count)

  • We listen for each message from the desired session (method listen)

  • When a cell get's executed we request his history through the message api, to get the code and execution_count

  • We match the cell with the existing one throigh his execution_count and update it

I know it's a very peculiar solution, but when the notebook comunicates with the messages api, doesn't include any information about some sort of cell identity, only his code.

Important note

This solution does not manage inserting or deleting cells, we would have to find a solution using the kernel history or something like that ...

Upvotes: 4

Related Questions