zeleven
zeleven

Reputation: 456

Scrapy: Using SQLAlchemy in scrapy has "NameError: name 'connection' is not defined"

Using SQLAlchemy in scrapy has NameError, the error message as follow:

Traceback (most recent call last):
  File "e:\weibo_spider\venv\lib\site-packages\twisted\internet\defer.py", line 1386, in _inlineCallbacks
    result = g.send(result)
  File "e:\weibo_spider\venv\lib\site-packages\scrapy\crawler.py", line 79, in crawl
    yield self.engine.open_spider(self.spider, start_requests)
NameError: name 'connection' is not defined

And here is my Scrapy Pipeline class:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

from .models import MyModel # my sqlalchemy model


class WeiboSpiderPipeline(object):

    def open_spider(self, spider):
        # using pymysql as the conncetor
        engine = create_engine('mysql+pymysql://root@localhost/wbspider_data')
        Session = sessionmaker(bind=engine)
        self.conn = engine.connect()
        self.session = Session(bind=connection)

    def close_spider(self, spider):
        self.conn.close()

    def process_item(self, item, spider):
        return item

I test the model in command line, it can work, but it occur the NameError after I run the scrapy crawl myspidername command.

Help!

Upvotes: 0

Views: 1565

Answers (2)

Dirk R
Dirk R

Reputation: 739

Actually you have a very simple error in your code.

If you look at this line:

self.session = Session(bind=connection)

You will notice that you haven't defined the connection variable anywhere. So hence the error (connection not defined) you are receiving.

You should replace that line with this instead:

self.session = Session(bind=self.conn)

Upvotes: 1

Umar Asghar
Umar Asghar

Reputation: 4064

Use this approach.

from sqlalchemy.orm import sessionmaker

Session = sessionmaker(bind=engine)
session = Session()

# it will send the connection back to the pool of connections in the engine, but it will not close the connection instead it will make the connection idle
session.close()
# to close the engine having pool of connections will close all the idle connections in the pool
engine.dispose()()


# for directing close the connection on session.close, use this code
# Disabling pooling using NullPool:

from sqlalchemy.pool import NullPool
engine = create_engine(
      'postgresql+psycopg2://scott:tiger@localhost/test',
      poolclass=NullPool)
session.close()

Upvotes: 2

Related Questions