Sweeper
Sweeper

Reputation: 271060

How can I simplify this LINQ query that searches for keywords in strings and orders them by relevance?

Let's say I have some MyObjects and each of them have a Description property. I have a list of keywords that I want to use to search through the MyObject list. I want to order them in descending order, by the number of keywords that each of their Description contains.

Sample input (only showing the Description property, note the initial order):

"Foo Bar"
"Foo Boo"
"Bar Bar"

Sample keywords:

"Boo", "Foo"

Sample output (only showing the Description property, note the final order):

"Foo Boo" (matches 2 keywords)
"Foo Bar" (matches 1 keyword)

"Bar" "Bar" is not in the results because it matches 0 keywords.

I am currently using this very complicated chain of methods:

return keywords.SelectMany(
    x => MyObjects.Where(y => y.Description.ToLowerInvariant().Contains(x.ToLowerInvariant()))
    )
    .GroupBy(x => x)
    .OrderByDescending(x => x.Count())
    .Select(x => x.Key).ToList();

As you can see, I am first selecting on keywords. I think that as a reader of the code, you would expect to see some transformations to be done on MyObjects first. Usually when I write LINQ I try to visualize in my head what the operations will look like. Seeing the keywords being transformed just feels counter-intuitive. I also don't like the nested query in SelectMany because it makes the query syntax look very ugly:

var query = from keyword in keywords
            from matchedObjects in (from obj in MyObjects where obj.Description.ToLowerInvariant().Contains(keyword.ToLowerInvariant()) select obj)
            group matchedObjects by matchedObjects into sameObjects
            orderby sameObjects.Count() descending
            select sameObjects.Key;
return query.ToList();

How can I improve the LINQ query? Ideally:

I would expect there is an easier/more intuitive way because this seems like a trivial thing to do, but I would also accept that there is no easier way if an explanation is provided.

Upvotes: 1

Views: 124

Answers (2)

Krzysztof Skowronek
Krzysztof Skowronek

Reputation: 2936

Try that:

    var objects = new[]{
                    new MyObject{Description = "Foo Bar"},
                    new MyObject{Description = "Foo Boo"},
                    new MyObject{Description = "Foo Bee"},
                    new MyObject{Description = "Bar Bee"},
                    new MyObject{Description = "Boo Bee"},
                };
                var keywords = new[] { "Foo", "Bar" };
                var results = objects
                    .GroupBy(x => keywords.Where(
                                          keyword => x.Description.Contains(keyword) 
                                          ).Count()
                    )
                    .Where(x => x.Key > 0) // discard no matches
//                    .OrderByDescending(x => x.Count()) // order by mathing objects count
                    .OrderByDescending(x => x.Key)
//                   .ToDictionary(x => x.Key, x => x.ToArray())
                     .Select(x => new {Count = x.Key, Objects = x.ToArray()}).ToList(); // or create anonymous type
                    ;

It groups objects by matches count, discards no matches and puts the most matches on the top

Upvotes: 1

cgt_mky
cgt_mky

Reputation: 196

Does

results = myObjects.OrderByDescending(myObject => keywords.Where(keyword => myObject.Description.Contains(keyword)).Count());

Give you what you need?

EDIT:

var temp = myObjects.Where(myObject => keywords.Any(keyword => myObject.Description.Contains(keyword)))
            .OrderByDescending(myObject => keywords.Where(keyword => myObject.Description.Contains(keyword)).Count());

Not sure if this counts as 'better' or not.

Upvotes: 2

Related Questions