Vivek Rathod
Vivek Rathod

Reputation: 68

Querying over large data under NHibernate transaction

I understand that explicit transactions should be used even for reading data but I am unable to understand why the below code runs much slower under an NHibernate transaction (as opposed to running without it)

session.BeginTransaction(); var result = session.Query<Order>().Where(o=>o.OrderNumber > 0).Take(100).ToList(); session.Transaction.Commit();

I can post more detailed UT code if needed but if I am querying over 50,000 Order records, it takes about 1 sec for this query to run under NHibernate's explicit transaction, and it takes only about 15/20 msec without one.

Update 1/15/2019 Here is the detailed code

[Test]
        public void TestQueryLargeDataUnderTransaction()
        {
            int count = 50000;
            using (var session = _sessionFactory.OpenSession())
            {
                Order order;
                // write large amount of data
                session.BeginTransaction();
                for (int i = 0; i < count; i++)
                {

                    order = new Order {OrderNumber = i, OrderDate = DateTime.Today};
                    OrderLine ol1 = new OrderLine {Amount = 1 + i, ProductName = $"sun screen {i}", Order = order};
                    OrderLine ol2 = new OrderLine {Amount = 2 + i, ProductName = $"banjo {i}", Order = order};
                    order.OrderLines = new List<OrderLine> {ol1, ol2};

                    session.Save(order);
                    session.Save(ol1);
                    session.Save(ol2);
                }

                session.Transaction.Commit();

                Stopwatch s = new Stopwatch();

                // read the same data 
                session.BeginTransaction();
                var result = session.Query<Order>().Where(o => o.OrderNumber > 0).Skip(0).Take(100).ToList();
                session.Transaction.Commit();

                s.Stop();
                Console.WriteLine(s.ElapsedMilliseconds);
            }
        }

Upvotes: 1

Views: 256

Answers (1)

Oskar Berggren
Oskar Berggren

Reputation: 5648

Your for-loop iterates 50000 times and for each iteration it creates 3 objects. So by the time you reach the first call to Commit(), the session knows about 150000 objects that it will flush to the database at Commit time (or earlier) (subject to your id generator policy and flush mode).

So far, so good. NHibernate is not necessarily optimised to handle so many objects in the session, but it can be acceptable providing one is careful.

On to the problem...

It's important to realize that committing the transaction does not remove the 150000 objects from the session.

When you later perform the query, it will notice that it is inside a transaction, in which case, by default, "auto-flushing" will be performed. This means that before sending the SQL query to the database, NHibernate will check if any of the objects known to the session has changes that might affect the outcome of the query (this is somewhat simplified). If such changes are found, they will be transmitted to the database before performing the actual SQL query. This ensures that the executed query will be able to filter based on changes made in the same session.

The extra second you notice is the time it takes for NHibernate to iterate over the 150000 objects known to the session to check for any changes. The primary use cases for NHibernate rarely involves more than tens or a few hundreds of objects, in which case the time needed to check for changes is negligible.

You can use a new session for the query to not see this effect, or you can call session.Clear() immediately after the first commit. (Note that for production code, session.Clear() can be dangerous.)

Additional: The auto-flushing happens when querying but only if inside a transaction. This behaviour can be controlled using session.FlushMode. During auto-flush NHibernate will aim to flush only objects that may affect the outcome of the query (i.e. which database tables are affected).

There is an additional effect to be aware of with regards to keeping sessions around. Consider this code:

using (var session = _sessionFactory.OpenSession())
{
    Order order;
    session.BeginTransaction();
    for (int i = 0; i < count; i++)
    {
        // Your code from above.
    }

    session.Transaction.Commit();

    // The order variable references the last order created. Let's modify it.
    order.OrderDate = DateTime.Today.AddDays(4);

    session.BeginTransaction();
    var result = session.Query<Order>().Skip(0).Take(100).ToList();
    session.Transaction.Commit();
}

What will happen with the change to the order date done after the first call to Commit()? That change will be persisted to the database when the query is performed in the second transaction despite the fact that the object modification itself happened before the transaction was started. Conversely, if you remove the second transaction, that modification will not be persisted of course.

There are multiple ways to manage sessions and transaction that can be used for different purposes. However, by far the easiest is to always follow this simple unit-of-work pattern:

  1. Open session.
  2. Immediately open transaction.
  3. Perform a reasonable amount of work.
  4. Commit or rollback transaction.
  5. Dispose transaction.
  6. Dispose session.
  7. Discard all objects loaded using the session. At this point they can still be used in memory, but any changes will not be persisted. Safer to just get rid of them.

Upvotes: 2

Related Questions