How to run GitHub code in a Jupyter Notebook?

Question

I try to run this code in my own computer: https://github.com/yicheng-w/CommonSenseMultiHopQA

I've downloaded the zip file but I have no idea how can I run this code in Jupyter notebook. It's my first time that I want to run a code from Github. I couldn't find any complete guide for that.

I have Python 3.6 on Windows.

Sabito · Accepted Answer

You can only use the .ipynb file in jupyter notebook. Try following the instructions given and running the code in a terminal/cmd.

By instructions I mean what is written in the README.md file of that GitHub repo. See, the files in this repo are .py files and not .ipynb files. The correct way to run them is to run them on a command prompt if you are in a windows machine or in a terminal if you are on Linux/Mac.

Step 0:

Clone the repo

Clone the repo that you have linked, namely CommonSenseMultiHopQA. For cloning the repo you must have git installed on your system. If you don't have it get it from here. When working on Github it is necessary to know how to use git. If you don't then follow this tutorial.
Step 1:

First, to setup the directory structure, please run setup.sh to create the appropriate directories.

.sh files are bash files that contain bash scripts. Run these using the command ./setup.sh. Here is a tutorial on running them. Running this command will automatically create the necessary directory (folder structure).
Step 2:
```
cd raw_data
git clone https://github.com/deepmind/narrativeqa.git
```
The first command changes your directory to raw_data. These a Linux commands (cd is available in windows too) you can learn about them here. The second command clones the narrativeqa into the raw_data folder.

Step 3:

For this step, you must know how to run .py files from the cmd/terminal. Watch this video for that.

We need to build processed datasets with extracted commonsense information. For NarrativeQA, we run:

python src/config.py \
    --mode build_dataset \
    --data_dir raw_data/narrativeqa \
    --load_commonsense \
    --commonsense_file data/cn_relations_orig.txt \
    --processed_dataset_train data/narrative_qa_train.jsonl \
    --processed_dataset_valid data/narrative_qa_valid.jsonl \
    --processed_dataset_test data/narrative_qa_test.jsonl

To build processed datasets with extracted commonsense for WikiHop, we run:

python src/config.py \
    --mode build_wikihop_dataset \
    --data_dir raw_data/qangaroo_v1.1 \
    --load_commonsense \
    --commonsense_file data/cn_relations_orig.txt \
    --processed_dataset_train data/wikihop_train.jsonl \
    --processed_dataset_valid data/wikihop_valid.jsonl

Both the long commands are running config.py file inside the src folder. The --something are aurguments passed to the python command. The first one :

python src/config.py \
    --mode build_dataset \
    --data_dir raw_data/narrativeqa \
    --load_commonsense \
    --commonsense_file data/cn_relations_orig.txt \
    --processed_dataset_train data/narrative_qa_train.jsonl \
    --processed_dataset_valid data/narrative_qa_valid.jsonl \
    --processed_dataset_test data/narrative_qa_test.jsonl

extracts commonsense for narrativeqa and the second:

python src/config.py \
    --mode build_wikihop_dataset \
    --data_dir raw_data/qangaroo_v1.1 \
    --load_commonsense \
    --commonsense_file data/cn_relations_orig.txt \
    --processed_dataset_train data/wikihop_train.jsonl \
    --processed_dataset_valid data/wikihop_valid.jsonl

extracts commonsense for WikiHop...

Finally, The following commands are for Training & Evaluation:

Training

To train models for NarrativeQA, run:

python src/config.py \
    --version {commonsense_nqa, baseline_nqa} \
    --model_name  \
    --processed_dataset_train data/narrative_qa_train.jsonl \
    --processed_dataset_valid data/narrative_qa_valid.jsonl \
    --batch_size 24 \
    --max_target_iterations 15 \
    --dropout_rate 0.2

To train models for WikiHop, run:

python src/config.py \
    --version {commonsense_wh, baseline_wh} \
    --model_name  \
    --elmo_options_file lm_data/wh/elmo_2x4096_512_2048cnn_2xhighway_options.json \
    --elmo_weight_file lm_data/wh/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5 \
    --elmo_token_embedding_file lm_data/wh/elmo_token_embeddings.hdf5 \
    --elmo_vocab_file lm_data/wh/wikihop_vocab.txt \
    --processed_dataset_train data/wikihop_train.jsonl \
    --processed_dataset_valid data/wikihop_valid.jsonl \
    --multiple_choice \
    --max_target_iterations 4 \
    --max_iterations 8 \
    --batch_size 16 \
    --max_target_iterations 4 \
    --max_iterations 8 \
    --max_context_iterations 1300 \
    --dropout_rate 0.2

Evaluation

To evaluate NarrativeQA, we need to first generate official answers on the test set. To do so, run:

python src/config.py \
    --mode generate_answers \
    --processed_dataset_valid data/narrative_qa_valid.jsonl \
    --processed_dataset_test data/narrative_qa_test.jsonl

Alternatively, if you really want to run these commands in jupyter notebook then all you need to do is to add a ! before all them and run them different cells.

How to run GitHub code in a Jupyter Notebook?

Answers (2)

Related Questions