user3088202
user3088202

Reputation: 3124

Import data frame from one Jupyter Notebook file to another

I have 3 separate jupyter notebook files that deal with separate data frames. I clean and manipulate the data in these notebooks for each df. Is there a way to reference the cleaned up/final data in a separate notebook?

My concern is that if I work on all 3 dfs in one notebook and then do more with it after (merge/join), it will be a mile long. I also don't want to re-write a bunch of code just to get data ready for use in my new notebook.

Upvotes: 9

Views: 10881

Answers (1)

JeremyDouglass
JeremyDouglass

Reputation: 1471

If you are using pandas data frames then one approach is to use pandas.DataFrame.to_csv() and pandas.read_csv() to save and load the cleaned data between each step.

  1. Notebook1 loads input1 and saves result1.
  2. Notebook2 loads result1 and saves result2.
  3. Notebook3 loads result2 and saves result3.

If this is your data:

import pandas as pd
raw_data = {'id': [10, 20, 30], 
            'name': ['foo', 'bar', 'baz']
           }
input = pd.DataFrame(raw_data, columns = ['id', 'name'])

Then in notebook1.ipynb, process it like this:

# load
df = pd.read_csv('input.csv', index_col=0)
# manipulate frame here
# ...
# save
df.to_csv('result1.csv')

...and repeat that process for each stage in the chain.

# load
df = pd.read_csv('result1.csv', index_col=0)
# manipulate frame here
# ...
# save
df.to_csv('result2.csv')

At the end, your notebook collection will look like this:

  • input.csv
  • notebook1.ipynb
  • notebook2.ipynb
  • notebook3.ipynb
  • result1.csv
  • result2.csv
  • result3.csv

Documentation:

Upvotes: 2

Related Questions