Processing large resultset with NHibernate

Question

I have following task to do: calculate interest for all active accounts. In the past I was doing things like this using Ado.Net and stored procedures. This time I've tried to do that with NHibernate, because it seemed that complex algorithms would be done easier with pure POCO. So I want to do following (pseudocode): foreach account in accounts calculate interest save account with new interest I'm aware that NHibernate was not designed for processinge large data volumes. For me it is sufficient to have possibility to organize such a loop without having all accounts in memory at once. To minimize memory usage I would use IStatelessSession for external loop instead of plain ISession. I've tried approach proposed by Ayende. There are two problems:

CreateQuery is using "magic strings";
more important: it doesn't work as described.

My program works but after switching on Odbc trace I saw in debugger that all fetches were done before lambda expression in .List was executed for the first time. I've found myself another solution: session.Query returning .AsEnumerable() which I've used in foreach. Again two problems:

I would prefer IQueryOver over IQueryable
still doesn't work as described (all fetches before first interest calculation).

I don't know why but IQueryOver doesn't have AsEnumerable. It also doesn't have List method with argument (like CreateQuery). I've tried .Future but again:

documentation of Future doesn't describe streaming feature
still doesn't work as I need (all fetches before first interest calculation).

In summary: is there any equivalent in NHibernate to dataReader.Read() from Ado.Net?

My best alternative to pure NHibernate approach would be main loop using dataReader.Read() and then Load account with id from Ado.Net loop. However performance will suffer - reading each account via key is slower than sequence of fetches done in outer loop.

I'm using NHibernate version 4.0.0.4000.

Jaguar · Accepted Answer

While it is true that NH was not designed with large-valume processing in mind you can always circumvent this restriction with application-layer batch processing. I have found that depending on the size of the object graph of the relevant entity, performance will suffer after a certain amount of objects have been loaded into memory (in one small project I could load 100.000 objects and performance would remain acceptable, in an other with only 1500 objects any additional Load() would crawl).

In the past I have used paging to handle batch processing, when IStatelessSession result sets are too poor (as they don't load proxies etc).

So you make a count query in the beginning, make up some arbitrary batch size and then start doing your work on the batch. This way you can neatly avoid the n+1 select problem, assuming that for each batch you explicitly fetch-join everything needed.

The caveat is that for this to work efficiently you will need to evict the processed entities of each batch from the ISession when you are done. And this means that you will have to commit-transaction on each batch. If you can live with multiple flush+commits then this could work for you.

Else you will have to go by the IStatelessSession although there are no lazy queries there. "from Books" means "select * from dbo.Books" or something equivalent and all results are fetched into memory.

Processing large resultset with NHibernate

Answers (1)

Related Questions