Will this NHibernate query impact performance?

Question

I am creating a website in ASP.NET MVC and use NHibernate as ORM. I have the following tables in my database:

Bookmarks
TagsBookmarks (junction table)
Tags

Mapping:

    public BookmarkMap()
    {
        Table("Bookmarks");
        Id(x => x.Id).Column("Id").GeneratedBy.Identity();
        Map(x => x.Title);
        Map(x => x.Link);
        Map(x => x.DateCreated);
        Map(x => x.DateModified);
        References(x => x.User, "UserId");
        HasManyToMany(x => x.Tags).AsSet().Cascade.None().Table("TagsBookmarks").ParentKeyColumn("BookmarkId")
        .ChildKeyColumn("TagId");
    }

    public TagMap()
    {
        Table("Tags");
        Id(x => x.Id).Column("Id").GeneratedBy.Identity();
        Map(x => x.Title);
        Map(x => x.Description);
        Map(x => x.DateCreated);
        Map(x => x.DateModified);
        References(x => x.User, "UserId");
        HasManyToMany(x => x.Bookmarks).AsSet().Cascade.None().Inverse().Table("TagsBookmarks").ParentKeyColumn("TagId")
        .ChildKeyColumn("BookmarkId");
    }

I need the data from both the Bookmarks and Tags table. More specific: I need 20 bookmarks with their related tags. The first thing I do is select 20 bookmark ids from the Bookmarks table. I do this because paging doesn't work well on a cartesian product that I get in the second query.

First query:

IEnumerable bookmarkIds = (from b in SessionFactory.GetCurrentSession().Query()
                                where b.User.Username == username
                                orderby b.DateCreated descending
                                select b.Id).Skip((page - 1) * pageSize).Take(pageSize).ToList();

After that I select the bookmarks for these ids.

Second query:

IEnumerable bookmarks = (from b in SessionFactory.GetCurrentSession().Query().Fetch(t => t.Tags)
                                   where b.User.Username == username && bookmarkIds.Contains(b.Id)
                                   orderby b.DateCreated descending
                                   select b);

The reason I use fetch is because I want to avoid N+1 queries. This works but results in a cartesian product. I have read in some posts that you should avoid cartesian products, but I don't really know how to do this in my case.

I have also read something about setting a batch size for the N+1 queries. Is this really faster than this single query?

An user can add max 5 tags to a bookmark. I select 20 bookmarks per page so worst case scenario for this second query is: 5 * 20 = 100 rows.

Will this impact performance when I have lots of data in the Bookmarks and Tags tables? Should I do this differently?

Daniel Schilling · Accepted Answer

This is not a Cartesian product.

~ Figure A ~
Bookmarks -> Tags -> Tag

A Cartesian product is all of the possible combinations of two different sets. For example, suppose we had three tables: Customer, CustomerAddress, and CustomerEmail. Customers have many addresses, and they also have many email addresses.

~ Figure B ~

Customers -> Addresses
          -> Emails

If you wrote a query like...

select *
from
    Customer c
    left outer join CustomerAddress a
        on c.Id = a.Customer_id
    left outer join CustomerEmail e
        on c.Id = e.Customer_id
where c.Id = 12345

... and this customer had 5 addresses and 5 email addresses, you would wind up with 5 * 5 = 25 rows returned. You can see why this would be bad for performance. It is unnecessary data. Knowing every possible combination of Address and Email Address for a customer tells us nothing useful.

With your query, you are not returning any unnecessary rows. Every row in the result set corresponds directly to a row in one of the tables you're interested in, and vice-versa. There is no multiplication. Instead you have TagsBookmarksCount + BookmarksThatDontHaveTagsCount.

The key place to look for Cartesian products is when your query branches off into two separate unrelated collections. If you're just digging deeper and deeper into a single chain of child collections, as in Figure A, there is no Cartesian product. The number of rows your query returns will be limited by the number of rows returned by that deepest collection. As soon as you branch off to the side so that you now have two parallel, side-by-side collections in the query, as in Figure B, then you have a Cartesian product, and results will be unnecessarily multiplied.

To fix a Cartesian product, split the query into multiple queries so the number of rows are added, not multiplied. With NHibernate's Future methods, you can batch those separate queries together, so you still only have one round trip to the database. See one of my other answers for an example of how to fix a Cartesian product in NHibernate.

Will this NHibernate query impact performance?

Answers (2)

Related Questions