Eric Herlitz
Eric Herlitz

Reputation: 26307

Sorting a generic list by an external sort order

I have a generic list

Simplified example

var list = new List<string>()
  {
    "lorem1.doc",
    "lorem2.docx",
    "lorem3.ppt",
    "lorem4.pptx",
    "lorem5.doc",
    "lorem6.doc",
  };

What I would like to do is to sort these items based on an external list ordering

In example

var sortList = new[] { "pptx", "ppt", "docx", "doc" };

// Or
var sortList = new List<string>() { "pptx", "ppt", "docx", "doc" };

Is there anything built-in to linq that could help me achieve this or do I have to go the foreach way?

Upvotes: 6

Views: 1920

Answers (5)

digEmAll
digEmAll

Reputation: 57220

Here's another way that does not use OrderBy:

var res = 
sortList.SelectMany(x => list.Where(f => Path.GetExtension(f).EndsWith(x)));

Note that the complexity of this approach is O(n * m) with n = sortList.Count and m list.Count.

The OrderBy approach worst-case complexity is instead O(n * m * log m) but probably in general it will be faster (since IndexOf does not result always in O(n) ). However with small n and m you won't notice any difference.

For big lists the fastest way ( complexity O(n+m) ) could be constructing a temporary lookup i.e. :

var lookup = list.ToLookup(x => Path.GetExtension(x).Remove(0,1));
var res = sortList.Where(x => lookup.Contains(x)).SelectMany(x => lookup[x]);

Upvotes: 0

Tim Schmelter
Tim Schmelter

Reputation: 460208

With the list you can use IndexOf for Enumerable.OrderBy:

var sorted = list.OrderBy(s => sortList.IndexOf(Path.GetExtension(s)));

So the index of the extension in the sortList determines the priority in the other list. Unknown extensions have highest priority since their index is -1.

But you need to add a dot to the extension to get it working:

var sortList = new List<string>() { ".pptx", ".ppt", ".docx", ".doc" };

If that's not an option you have to fiddle around with Substring or Remove, for example:

var sorted = list.OrderBy(s => sortList.IndexOf(Path.GetExtension(s).Remove(0,1)));

Upvotes: 8

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236268

This solution will work even if some file names do not have extensions:

var sortList = new List<string>() { "pptx", "ppt", "docx", "doc" };
var list = new List<string>()
  {
    "lorem1.doc",
    "lorem2.docx",
    "lorem3.ppt",
    "lorem4.pptx",
    "lorem5.doc",
    "lorem6.doc",
  };

var result = 
       list.OrderBy(f => sortList.IndexOf(Path.GetExtension(f).Replace(".","")));

Upvotes: 6

Rawling
Rawling

Reputation: 50144

A sortDicionary would be more efficient:

var sortDictionary = new Dictionary<string, int> {
    { ".pptx", 0 },
    { ".ppt" , 1 },
    { ".docx", 2 },
    { ".doc" , 3 } };

var sortedList = list.OrderBy(i => {
    var s = Path.GetExtension(i);
    int rank;
    if (sortDictionary.TryGetValue(s, out rank))
        return rank;
    return int.MaxValue; // for unknown at end, or -1 for at start
});

This way the lookup is O(1) rather than O(# of extensions).

Also, if you have a large number of filenames and a small number of extensions, it might actually be faster to do

var sortedList = list
    .GroupBy(p => Path.GetExtension(p))
    .OrderBy(g => {
        int rank;
        if (sortDictionary.TryGetValue(g.Key, out rank))
            return rank;
        return int.MaxValue; // for unknown at end, or -1 for at start
    })
    .SelectMany(g => g);

This means the sort scales by the number of distinct extensions in the input, rather than the number of items in the input.

This also allows you to give two extensions the same priority.

Upvotes: 1

MarcinJuraszek
MarcinJuraszek

Reputation: 125640

You could try using Array.IndexOf() method:

var sortedList = list.OrderBy(i => sortList.IndexOf(System.IO.Path.GetExtension(i))).ToList();

Upvotes: 1

Related Questions