Reputation: 6478

General strategy for complex multi-stage searches

I have an application with allows for a certain entity to be searched upon based on several different criteria (somewhere in the order of 20 different methods in total). I want to be able to combine the results of several searches in order to produce a single result set.

For example:

results = (entities from search 1 AND entities from search 2) OR (entities from search 3)

Let us assume that the searches are complex enough in nature such that combining them into a single logical query is not possible (due to complex relationships that need to be queried, etc).

Let us also assume that the number of entities involved (likely) makes any sort of in-memory strategy infeasible.

My initial thoughts were something along the lines of:

1) Perform the searches seperately, obtain a list of matching "entity ids" from each of them, and then perform a "root-level" search based upon these.

For example:

select * from entity e
where 
(e.Id in (search 1 id list) AND e.Id in(search 2 id list))
OR e.Id in (search 3 id list)

2) Perform an outer query that selects the entity based upon the results returned by my (complex) subqueries.

For example:

select * from entity e
where (e.Id in (select e1.id from entity e1 where ...) AND e.Id in (select e2.id from entity e2 where...))
OR e.Id in (select e3.id from entity e3 where...)

Obviously these examples are drastically simplified for illustration purposes; the individual queries will be much more involved, and the combination of them will be arbitrary (I've just illustrated a representative example here).

I'd be very interested in hearing suggestions for how others have handled this situation. I'm certainly open to any possibilities that I haven't explored above.

For reference, this is a .NET application making use of an NHibernate ORM backed by a SQL Server 2008 R2 database.

I've already decided upon using either hql or native sql for this as ICriteria or Linq do not provide the flexibility needed for performing the individual queries nor the combining operations required.

Upvotes: 5

Answers (3)

Variant

Reputation: 17385

My approach using Linq is by building a list of where expressions that construct the complex criteria, and applying them together in the end.

Something like that:

List<Expression<Func<WorkItem, bool>>> whereExpressions = new List<Expression<Func<WorkItem, bool>>>();
if (!string.IsNullOrEmpty(searchMask))
            {
                whereExpressions.Add(
                                        x =>
                                        (x.Name.ToLower().IndexOf(searchMask.ToLower()) > -1 ||
                                         x.Id.ToString().IndexOf(searchMask) > -1 ||
                                         (x.Description != null &&
                                          x.Description.ToLower().IndexOf(searchMask.ToLower()) > -1)));
            }

whereExpressions.Add(x => (x.Status == status));

Eventually after building the expression list you apply the expressions:

IQueryable<WorkItem> result = Session.Linq<WorkItem>();
foreach (Expression<Func<WorkItem, bool>> whereExpression in whereExpressions)
            {
                result = result.Where(whereExpression);
            }

You can also provide flexibility in the sorting method and allow paging:

IQueryable<WorkItem> items;
            if (ascOrDesc == "asc")
            {
                items = result.OrderBy(DecideSelector(indexer)).Skip(startPoint - 1).Take(numOfrows);
            }
            else
            {
                items = result.OrderByDescending(DecideSelector(indexer)).Skip(startPoint - 1).Take(numOfrows);
            }

Where DecideSelector is defined like this:

private Expression<Func<WorkItem, object>> DecideSelector(string fieldCode)
        {
            switch (fieldCode)
            {
                case "Deadline":
                    return item => item.Deadline;
                case "name":
                    return item => item.Name;
                case "WiStatus":
                    return item => item.Status;
                case "WiAssignTo":
                    return item => item.AssignedUser;
                default:
                    return item => item.Id;
            }
        }

Upvotes: 2

Adam Boddington

Reputation: 6800

If you can use ICriteria, I'd recommend it. It can drastically cut down on the amount of code with complex searches. For example, the difference between using one search by itself, and using it as a subquery in your aggregate search, would be an added projection.

I haven't as yet tried to split complex searches up and running them seperately. Combining the entire search into one call to the database, as per your second example, so far has worked for me. If I'm not getting a decent response time (minutes as opposed to seconds), the Database Engine Tuning Advisor has proved invaluable with suggested indexes and statistics.

Upvotes: 0

John

Reputation: 5834

I've done this by keeping search performance counters in a table. Basically monitoring the average percentage of rows that the search filters and the run time.

I then create a performance figure based on TotalNumberOfRowsToSearch * Percent_Not_Matched / RunTimeInSeconds This figure is a direct correlation of rows per second it can filter out. Averaged over thousands of runs, it is a rather good prediction.

I then run each query in order with highest performance figure one first.

If you're doing a logical AND on the total result, run each subsequent query only on the results of the previous query.

If you're doing a logical OR, run each subsequent query only on the results NOT IN the combined previous search results.

By doing it this way, your query will change based on indexes and types of data.

If you want a less dynamic solution, simply calculate performance figures for each part of the search and use the better performing ones first. Remember a query that runs in 55ms but matches 99% of the results is not as useful as one that runs in 1 second and matches 1% of the results, so be wary that results may go against your initial ideas.

Just look out for the divide by 0 error when calculating performance figures.

Upvotes: 2

General strategy for complex multi-stage searches

Answers (3)

Related Questions