Reputation: 1227
I created a module called my_module.py
as follows
import pandas as pd
def create_df(text):
df = <create a dataframe from the text>
return df
In a cell in Jupyter notebook, I can create a dataframe like this
from my_module import create_df
txt = 'this is a test'
df = create_df(txt)
However, in another cell, when I ran this query
pd.DataFrame?
It returned
Object `pd.DataFrame` not found.
Could you please explain what's going on? Shouldn't I include the import pandas as pd
in my_module.py
but should declare in a cell of the notebook import pandas as pd
?
Upvotes: 0
Views: 577
Reputation: 389
In your notebook you import your module, but you do not write:
import pandas as pd
hence pandas
is not imported, nor is pd
added to the namespace of your notebook (check dir()
, you will not see pd
in your notebook), and so your interpreter has no idea of what a pd.DataFrame
is.
I think your confusion stems from the fact that you imported pandas
in your module, and so it will also be imported in your main script or notebook. No, you need to import it again, as any import made in a module does not carry over to the main script.
EDIT: To be more specific, in python there is the concept of a namespace, which is a collection of global variables that are associated to a module.
Key concept: each module has its own namespace: numpy
has one, pandas
has one, your main.py
script has one, and they are separated.
When you write import pandas as pd
in your my_module.py
for example, you are importing pandas
as .pd
in your namespace to your my_module.py
script. There, and only there, by writing pd.
you will be able to access the components of your library.
If you now write from my_module import create_df
in your main.py
(or notebook in your case), you add create_df
to the namespace of main.py
. The main.py
has no knowledge of the imports done in my_module.py
because it does not share its namespace, so you cannot use pd.
here.
Likewise, you cannot import pandas as pd
in the main.py
and not in the my_module.py
, and hope that it will be recognized: you will call create_df
, it will look for the variables in the my_modules.py
namespace, not see .pd
and it will throw an error.
Btw, you can see which variables are loaded in the namespace of your script by using the built-in method dir(<module_name>)
. By default dir()
will give you the namespace of the script you called it in, while dir(pd)
will give you the one of pandas
(provided of course that pd
is in the namespace, i.e import pandas as pd
!)
Hope it is clearer!
Upvotes: 2