Reputation: 1550
I'm trying to import a package from my project which is not in the same directory as scrapy is in. The directory structure for my project is as follows:
Main
__init__.py
/XPaths
__init.py
XPaths.py
/scrapper
scrapy.cfg
/scrapper
__init.py
settings.py
items.py
pipelines.py
/spiders
myspider.py
I'm trying to access xpaths.py
from within myspider.py
. Here are my attempts:
1) from Main.XPaths.XPaths import XPathsHandler
2) from XPaths.XPaths import XPathsHandler
3) from ..Xpaths.XPaths import XPathsHandler
These failed with the error:
ImportError: No module named .......
My last attempt was:
4) from ...Xpaths.XPaths import XPathsHandler
Which also failed with the error:
ValueError: Attempted relative import beyond toplevel package
What am I doing wrong? XPaths
is independent from Scrapy, therefore the file structure has to stay that way.
//EDIT
After some further debugging following @alecxe comment, I tried adding the path to main
inside the sys.path
, and print it before importing xpaths. The weird thing is, the scrapper
directory gets appended to the path when I run scrapy. Here's what I added:
'C:\\Users\\LaptOmer\\Code\\Python\\PythonBackend\\Main'
And here's what I get when I print sys.path
:
'C:\\Users\\LaptOmer\\Code\\Python\\PythonBackend\\Main\\scrapper'
Why does scrapy append that to the path?
Upvotes: 4
Views: 2343
Reputation: 7457
I have a similar directory structure, with multiple scrapers (say directories scraper1
and scraper2
).
Since I found the sys.path
changes as suggested by @ErdraugPl too brittle (see @ethanenglish's problems), especially since Scrapy itself is modifying the sys.path
, I chose an OS solution instead of a Python solution: I created a symbolic link to directory /XPaths
in both scraper1
and scraper2
. That way, I can still maintain a single XPaths
module that I can use in both scraper1
and scraper2
, and can simply do from XPaths.XPaths import XPathsHandler
Upvotes: 0
Reputation: 1327
I ran into the same problem.
When I used:
sys.path.append(os.path.join(os.path.dirname(__file__), '../..'))
it appended ../..
to the last file path, which didn't work. I noticed my main file was the last item in the sys.path list. I took that last item and went to the module level to find my main file -- which contains a function called "extract_notes".
import scrapy
import sys
import os
mod_path = os.path.dirname(os.path.normpath(sys.path[-1]))
sys.path.insert(0,mod_path)
from pprint import pprint as p
from main import extract_notes
Hope that helps.
Upvotes: 0
Reputation: 404
I know its a little bit messy solution but only one I could find when I had same problem as you. Before including files from your project you need to manually append the system path to your top most package level, i.e:
sys.path.append(os.path.join(os.path.dirname(__file__), '../..'))
from XPaths.XPaths import XPathsHandler
...
From what I understand scrappy creates its own package - this is why you cannot import files from other directories. This also explains error:
ValueError: Attempted relative import beyond toplevel package
Upvotes: 1