Hierarchy in SQL and ASP.net

I am creating a web page which contains Posts and Comments. When I retrieve the Comments for each Post during the rendering of that Post, however, the retrieval is very slow. My guess is that this is due to opening/closing connections frequently. Regardless, it doesn't scale well.

To fix this, I am considering merging the two entities into one, and modeling Posts and Comments in a polymorphous way. In other words, Entry becomes a super-class, sub-classed by Post and Comment.

Can anyone give me some advice as to whether or not this is a valid approach? Alternatively, I am open to other suggestions which might also solve the nested Repeater/DataList performance issue which I may not have thought of.

Upvotes: 1

Views: 78

Answers (2)

Troy Alford
Troy Alford

Reputation: 27236

To expand on what nunespascal said:

Whether you're pulling the data back as two separate entities via two separate tables OR you're pulling it back polymorphically out of the same table, it sounds like the issue you are having is around how you are requesting the data.

Consider these two approaches in C#-ish EF pseudo-code:

Approach 1: Iteratively Load Children

var posts = db.Posts;
foreach (Post p in posts) {
    Html.Render(p);

    var comments = db.Comments.Where(c => c.PostId == p.PostId);
    foreach (Comment c in comments) {
        Html.Render(c);
    }
}

This is what it sounds like you're essentially doing in your current repeater iteration. For each post you come to, load the comments which belong to it - and render them out.

This creates exactly the bottle-neck you were describing, in which you are opening/closing a lot of connections and running a lot of separate, atomic SQL statements. Avoid this if you can.

Approach 2: Load Parents/Children Together

var posts = db.Posts.Top(10);
// ^ The .Top(10) limits the result-set to a manageable number of posts (10).

var ids = posts.Select(p => p.PostId);
// ^ This line creates an Enumerable list of the IDs of your loaded posts.

var comments = db.Comments.Where(c => ids.Contains(c.PostId));
// ^ This line loads all the comments which belong to the 10 posts you loaded.

foreach (Post p in posts) {
    Html.Render(p);

    foreach (Comment c in comments.Where(c => c.PostId == p.PostId)) {
        // This loop iterates the comments in the same way the previous example
        // showed, with the exception that it iterates *in memory*, rather than
        // running additional SQL on each iteration.
        Html.Render(c);
    }
}

Because you're loading all of the items in memory in the second example, you save all the round trips - making only 2 SQL statements, both of which are run at the start.

If you are actually using EF4/5 (which the above code is based on), you could actually even do the following:

var posts = db.Posts.Include("Comments");
// ^ This line loads both Posts and Comments in a single SQL statement

foreach (Post p in posts) {
    Html.Render(p);

    foreach (Comment c in p.Comments) {
        // EF5 and other ORMs can create a .Comments property which contains an
        // Enumerable list of the comments, already filtered on the FK binding
        Html.Render(c);
    }
}

Upvotes: 1

nunespascal
nunespascal

Reputation: 17724

Even if you had a blog with thousands of comments on the post, it would still work well.
RDBMS tables often run into millions of records.

Opening and closing connections frequently will be a performance bottleneck.
You should fetch all the data at once, and then pass that to your nested repeater.

Upvotes: 1

Related Questions