Nemo
Nemo

Reputation: 1227

Python libraries imported in custom module is not recognised globally

I created a module called my_module.py as follows

import pandas as pd

def create_df(text):

    df = <create a dataframe from the text>

    return df

In a cell in Jupyter notebook, I can create a dataframe like this

from my_module import create_df

txt = 'this is a test'
df = create_df(txt)

However, in another cell, when I ran this query

pd.DataFrame?

It returned

Object `pd.DataFrame` not found.

Could you please explain what's going on? Shouldn't I include the import pandas as pd in my_module.py but should declare in a cell of the notebook import pandas as pd?

Upvotes: 0

Views: 577

Answers (1)

neko
neko

Reputation: 389

In your notebook you import your module, but you do not write:

import pandas as pd

hence pandas is not imported, nor is pd added to the namespace of your notebook (check dir(), you will not see pd in your notebook), and so your interpreter has no idea of what a pd.DataFrame is.

I think your confusion stems from the fact that you imported pandas in your module, and so it will also be imported in your main script or notebook. No, you need to import it again, as any import made in a module does not carry over to the main script.

EDIT: To be more specific, in python there is the concept of a namespace, which is a collection of global variables that are associated to a module.

Key concept: each module has its own namespace: numpy has one, pandas has one, your main.py script has one, and they are separated.

When you write import pandas as pd in your my_module.py for example, you are importing pandas as .pd in your namespace to your my_module.py script. There, and only there, by writing pd. you will be able to access the components of your library.

If you now write from my_module import create_df in your main.py (or notebook in your case), you add create_df to the namespace of main.py. The main.py has no knowledge of the imports done in my_module.py because it does not share its namespace, so you cannot use pd. here.

Likewise, you cannot import pandas as pd in the main.py and not in the my_module.py, and hope that it will be recognized: you will call create_df, it will look for the variables in the my_modules.py namespace, not see .pd and it will throw an error.

Btw, you can see which variables are loaded in the namespace of your script by using the built-in method dir(<module_name>). By default dir() will give you the namespace of the script you called it in, while dir(pd) will give you the one of pandas (provided of course that pd is in the namespace, i.e import pandas as pd!)

Hope it is clearer!

Upvotes: 2

Related Questions