Tania
Tania

Reputation: 1925

Using SQLAlchemy sessions and transactions

While learning SQLAlchemy I came across two ways of dealing with SQLAlchemy's sessions. One was creating the session once globally while initializing my database like:

DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))

and import this DBSession instance in all my requests (all my insert/update) operations that follow. When I do this, my DB operations have the following structure:

with transaction manager:
    for each_item in huge_file_of_million_rows:
        DBSession.add(each_item)
        //More create, read, update and delete operations

I do not commit or flush or rollback anywhere assuming my Zope transaction manager takes care of it for me (it commits at the end of the transaction or rolls back if it fails)

The second way and the most frequently mentioned on the web way was: create a DBSession once like

    DBSession=sessionmaker(bind=engine)
    and then create a session instance of this per transaction:
    session = DBSession()

for row in huge_file_of_million_rows:
    for item in row:
        try:
            DBsesion.add(item)
                //More create, read, update and delete operations
                DBsession.flush()
                DBSession.commit()
        except:
            DBSession.rollback()
DBSession.close()
  1. I do not understand which is BETTER ( in terms of memory usage, performance, and healthy) and how?

  2. In the first method, I accumulate all the objects to the session and then the commit happens in the end. For a bulky insert operation, does adding objects to the session result in adding them to the memory(RAM) or elsewhere? where do they get stored and how much memory is consumed?

  3. Both the ways tend to be very slow when I have about a million inserts and updates. Trying SQLAlchemy core also takes the same time to execute. 100K rows select insert and update takes about 25-30 minutes. Is there any way to reduce this?

Please point me in the right direction. Thanks in advance.

Upvotes: 1

Views: 2694

Answers (1)

lrnzcig
lrnzcig

Reputation: 3947

Here you have a very generic answer, and with the warning that I don't know much about zope. Just some simple database heuristics. Hope it helps.

  1. How to use SQLAlchemy sessions: First, take a look to their own explanation here

As they say:

The calls to instantiate Session would then be placed at the point in the application where database conversations begin.

I'm not sure I understand what you mean with method 1.; just in case, a warning: you should not have just one session for the whole application. You instantiate Session when the database conversations begin, but you surely have several points in the application in which you have different conversations beginning. (I'm not sure from your text if you have different users).

  1. One commit at the end of a huge number of operations is not a good idea

Indeed it will consume memory, probably in the Session object of your python program, and surely in the database transaction. How much space? That's difficult to say with the information you provide; it will depend on the queries, on the database...

You could easily estimate it with a profiler. Take into account that if you run out of resources everything will go slower (or halt).

  1. One commit per register is also not a good idea when processing a bulk file

It means you are asking the database to persist changes every time for every row. Certainly too much. Try with an intermediated number, commit every n hundreds of rows. But then it gets more complicated; one commit at the end of the file assures you that the file is either processed or not, while intermediate commits force you to take into account, when something fails, that your file is half through - you should reposition.

As for the times you mention, it is very difficult with the information you provide + what is your database + machine. Anyway, the order of magnitude of your numbers, a select+insert+update per 15ms, probably plus commit, sounds pretty high but more or less on the expected range (again it depends on queries + database + machine)... If you have to frequently insert so many registers you could consider other database solutions; it will depend on your scenario, and probably on dialects and may not be provided by an orm like SQLAlchemy.

Upvotes: 2

Related Questions