user1613714
user1613714

Reputation:

Will this NHibernate query impact performance?

I am creating a website in ASP.NET MVC and use NHibernate as ORM. I have the following tables in my database:

Mapping:

    public BookmarkMap()
    {
        Table("Bookmarks");
        Id(x => x.Id).Column("Id").GeneratedBy.Identity();
        Map(x => x.Title);
        Map(x => x.Link);
        Map(x => x.DateCreated);
        Map(x => x.DateModified);
        References(x => x.User, "UserId");
        HasManyToMany(x => x.Tags).AsSet().Cascade.None().Table("TagsBookmarks").ParentKeyColumn("BookmarkId")
        .ChildKeyColumn("TagId");
    }

    public TagMap()
    {
        Table("Tags");
        Id(x => x.Id).Column("Id").GeneratedBy.Identity();
        Map(x => x.Title);
        Map(x => x.Description);
        Map(x => x.DateCreated);
        Map(x => x.DateModified);
        References(x => x.User, "UserId");
        HasManyToMany(x => x.Bookmarks).AsSet().Cascade.None().Inverse().Table("TagsBookmarks").ParentKeyColumn("TagId")
        .ChildKeyColumn("BookmarkId");
    }

I need the data from both the Bookmarks and Tags table. More specific: I need 20 bookmarks with their related tags. The first thing I do is select 20 bookmark ids from the Bookmarks table. I do this because paging doesn't work well on a cartesian product that I get in the second query.

First query:

IEnumerable<int> bookmarkIds = (from b in SessionFactory.GetCurrentSession().Query<Bookmark>()
                                where b.User.Username == username
                                orderby b.DateCreated descending
                                select b.Id).Skip((page - 1) * pageSize).Take(pageSize).ToList<int>();

After that I select the bookmarks for these ids.

Second query:

IEnumerable<Bookmark> bookmarks = (from b in SessionFactory.GetCurrentSession().Query<Bookmark>().Fetch(t => t.Tags)
                                   where b.User.Username == username && bookmarkIds.Contains(b.Id)
                                   orderby b.DateCreated descending
                                   select b);

The reason I use fetch is because I want to avoid N+1 queries. This works but results in a cartesian product. I have read in some posts that you should avoid cartesian products, but I don't really know how to do this in my case.

I have also read something about setting a batch size for the N+1 queries. Is this really faster than this single query?

An user can add max 5 tags to a bookmark. I select 20 bookmarks per page so worst case scenario for this second query is: 5 * 20 = 100 rows.

Will this impact performance when I have lots of data in the Bookmarks and Tags tables? Should I do this differently?

Upvotes: 2

Views: 234

Answers (2)

Daniel Schilling
Daniel Schilling

Reputation: 4967

This is not a Cartesian product.

~ Figure A ~

Bookmarks -> Tags -> Tag

A Cartesian product is all of the possible combinations of two different sets. For example, suppose we had three tables: Customer, CustomerAddress, and CustomerEmail. Customers have many addresses, and they also have many email addresses.

~ Figure B ~

Customers -> Addresses
          -> Emails

If you wrote a query like...

select *
from
    Customer c
    left outer join CustomerAddress a
        on c.Id = a.Customer_id
    left outer join CustomerEmail e
        on c.Id = e.Customer_id
where c.Id = 12345

... and this customer had 5 addresses and 5 email addresses, you would wind up with 5 * 5 = 25 rows returned. You can see why this would be bad for performance. It is unnecessary data. Knowing every possible combination of Address and Email Address for a customer tells us nothing useful.

With your query, you are not returning any unnecessary rows. Every row in the result set corresponds directly to a row in one of the tables you're interested in, and vice-versa. There is no multiplication. Instead you have TagsBookmarksCount + BookmarksThatDontHaveTagsCount.

The key place to look for Cartesian products is when your query branches off into two separate unrelated collections. If you're just digging deeper and deeper into a single chain of child collections, as in Figure A, there is no Cartesian product. The number of rows your query returns will be limited by the number of rows returned by that deepest collection. As soon as you branch off to the side so that you now have two parallel, side-by-side collections in the query, as in Figure B, then you have a Cartesian product, and results will be unnecessarily multiplied.

To fix a Cartesian product, split the query into multiple queries so the number of rows are added, not multiplied. With NHibernate's Future methods, you can batch those separate queries together, so you still only have one round trip to the database. See one of my other answers for an example of how to fix a Cartesian product in NHibernate.

Upvotes: 1

dwerner
dwerner

Reputation: 6602

Query<>.Fetch() is intended to ensure that eager loading is taking place, and when it's a one-to-many relationship, as this appears to be ( i.e. if Bookmark.Tags is a collection) then the two ways you are going about this are roughly equivalent. If Tags is lazy-loaded and only accessed rarely, then leaving it non-fetched may be the best way to go (as in your first query), because you will not always be accessing the Tags much. This depends on use case.

If, on the other hand, you know that you will always be getting all the tags, it may make more sense to break this off into another query, this time on the whatever the Tags type/table is, and look them up instead of using the NHibernate relations to do the job.

If Tag has a foreign key to Bookmarks, like BookmarkId, ToLookup can be useful in this case:

var tagLookup = (from t in SessionFactory.GetCurrentSession().Query<Tag>()
                 // limit query appropriately for all the bookmarks you need
                 // this should be done once, in this optimization
                 select new {key=t.BookmarkId, value=t} )
                 .ToLookup(x=>x.key, x=>x.value);

Will give you a lookup (ILookup<int, Tag>) where you can do something like:

IGrouping<Tag> thisBookmarksTags = tagLookup[bookmarkId];

Which will give you the tags you need for that bookmark. This separates it out into another query, thereby avoiding N+1.

This is making quite a few assumptions about your data model, and the mappings, but I hope it illustrates a pretty straight-forward optimization that you can use.

Upvotes: 0

Related Questions