jbkkd
jbkkd

Reputation: 1550

Scrapy: Importing a package from the project that's not in the same directory

I'm trying to import a package from my project which is not in the same directory as scrapy is in. The directory structure for my project is as follows:

Main
  __init__.py
  /XPaths
    __init.py
    XPaths.py
  /scrapper
    scrapy.cfg
    /scrapper
      __init.py
      settings.py
      items.py
      pipelines.py
      /spiders
        myspider.py

I'm trying to access xpaths.py from within myspider.py. Here are my attempts:

1) from Main.XPaths.XPaths import XPathsHandler

2) from XPaths.XPaths import XPathsHandler

3) from ..Xpaths.XPaths import XPathsHandler

These failed with the error:

ImportError: No module named .......

My last attempt was:

4) from ...Xpaths.XPaths import XPathsHandler

Which also failed with the error:

ValueError: Attempted relative import beyond toplevel package

What am I doing wrong? XPaths is independent from Scrapy, therefore the file structure has to stay that way.

//EDIT

After some further debugging following @alecxe comment, I tried adding the path to main inside the sys.path, and print it before importing xpaths. The weird thing is, the scrapper directory gets appended to the path when I run scrapy. Here's what I added:

'C:\\Users\\LaptOmer\\Code\\Python\\PythonBackend\\Main'

And here's what I get when I print sys.path:

'C:\\Users\\LaptOmer\\Code\\Python\\PythonBackend\\Main\\scrapper'

Why does scrapy append that to the path?

Upvotes: 4

Views: 2343

Answers (3)

DaveFar
DaveFar

Reputation: 7457

I have a similar directory structure, with multiple scrapers (say directories scraper1 and scraper2).

Since I found the sys.path changes as suggested by @ErdraugPl too brittle (see @ethanenglish's problems), especially since Scrapy itself is modifying the sys.path, I chose an OS solution instead of a Python solution: I created a symbolic link to directory /XPaths in both scraper1 and scraper2. That way, I can still maintain a single XPaths module that I can use in both scraper1 and scraper2, and can simply do from XPaths.XPaths import XPathsHandler

Upvotes: 0

ethanenglish
ethanenglish

Reputation: 1327

I ran into the same problem.

When I used:

sys.path.append(os.path.join(os.path.dirname(__file__), '../..'))

it appended ../.. to the last file path, which didn't work. I noticed my main file was the last item in the sys.path list. I took that last item and went to the module level to find my main file -- which contains a function called "extract_notes".

import scrapy
import sys
import os

mod_path = os.path.dirname(os.path.normpath(sys.path[-1]))
sys.path.insert(0,mod_path)

from pprint import pprint as p
from main import extract_notes

Hope that helps.

Upvotes: 0

ErdraugPl
ErdraugPl

Reputation: 404

I know its a little bit messy solution but only one I could find when I had same problem as you. Before including files from your project you need to manually append the system path to your top most package level, i.e:

sys.path.append(os.path.join(os.path.dirname(__file__), '../..'))
from XPaths.XPaths import XPathsHandler
...

From what I understand scrappy creates its own package - this is why you cannot import files from other directories. This also explains error:

ValueError: Attempted relative import beyond toplevel package

Upvotes: 1

Related Questions