Vikas
Vikas

Reputation: 805

linq order by taking really long time

I am kind of stuck again as I am unable to understand this.

So I have a class named CSVItem:

public class CSVItem
{
    public int SortedAccountNumber { get; set; }
    public DateTime Date { get; set; }
    public int SNO { get; set; }
    public string AccountNumber { get; set; }
    public double Value { get; set; }
    public int Year
    {
        get
        {
            if (Date.Month > MainWindow.fiscalMonth)
            {
                return Date.Year+1;
            }
            return Date.Year;
        }
    }
    public int StaticCounter { get { return 1; } }

    public CSVItem(string accNo, DateTime date, double value, int sNo)
    {
        Value = value;            
        Date = date;
        AccountNumber = accNo;
        SNO = sNo;
    }


}

I read a CSV, and I make a List of Type CSV Item with about 500k items. Then I try to sort using the default Order By method of the list, and try to return the list from the sorted collection. Here is the code:

List<CSVItem> items = new List<CSVItem>();

 // ---- some code to read csv and load into items collection

 List<CSVItem> vItems = items.OrderBy(r1 => r1.AccountNumber).ThenBy(r1 => r1.Date).ToList();

It is like taking forever to sort and then convert the collection back to a list. Well I have certainly tried loading about a million records previously and never had such -no response- from Linq Sorting ever and it is kind of driving me crazy. Any help or suggestion on where I can look for finding the problem?

Upvotes: 1

Views: 2488

Answers (1)

greenhoorn
greenhoorn

Reputation: 1561

You can use AsParallel() to your advantage.

List<CSVItem> vItems = items.AsParallel().OrderBy(r1 => r1.AccountNumber).ThenBy(r1 => r1.Date).ToList();

The question arised, if the parallelization of OrderBy() does have side-effects if it's followed by a ThenBy().

When does the AsParallel() split the IEnumerable? There are 2 possible answers. Let's take the given query:

items.AsParallel().OrderBy(x=>x.Age).ThenBy(x=>x.Size)

Option 1

The items get split, each part gets ordered by age, then by size and finally merge back into 1 list. Obviously not what we want.

Option 2

The items get split, each part gets ordered by age, the items merge back into 1 list. After that, the items get split again, ordered by size and merge back into 1 list. That's what we want.

I created a little example to check, which one is true.

using System;
using System.Collections.Generic;
using System.Linq;

public class Program
{
    static void Main(string[] args)
    {
        List<TestItem> items = new List<TestItem>();
        List<TestItem> itemsNonParallel = new List<TestItem>();

        items.Add(new TestItem() { Age = 1, Size = 12 });
        items.Add(new TestItem() { Age = 2, Size = 1 });
        items.Add(new TestItem() { Age = 5, Size = 155 });
        items.Add(new TestItem() { Age = 23, Size = 42 });
        items.Add(new TestItem() { Age = 7, Size = 32 });
        items.Add(new TestItem() { Age = 9, Size = 22 });
        items.Add(new TestItem() { Age = 34, Size = 11 });
        items.Add(new TestItem() { Age = 56, Size = 142 });
        items.Add(new TestItem() { Age = 300, Size = 13 });

        itemsNonParallel.Add(new TestItem() { Age = 1, Size = 12 });
        itemsNonParallel.Add(new TestItem() { Age = 2, Size = 1 });
        itemsNonParallel.Add(new TestItem() { Age = 5, Size = 155 });
        itemsNonParallel.Add(new TestItem() { Age = 23, Size = 42 });
        itemsNonParallel.Add(new TestItem() { Age = 7, Size = 32 });
        itemsNonParallel.Add(new TestItem() { Age = 9, Size = 22 });
        itemsNonParallel.Add(new TestItem() { Age = 34, Size = 11 });
        itemsNonParallel.Add(new TestItem() { Age = 56, Size = 142 });
        itemsNonParallel.Add(new TestItem() { Age = 300, Size = 13 });

        foreach (var item in items.AsParallel().OrderBy(x => x.Age).ThenBy(x => x.Size))
        {
            Console.WriteLine($"Age: {item.Age}     Size: {item.Size}");
        }

        Console.WriteLine("---------------------------");

        foreach (var item in itemsNonParallel.OrderBy(x => x.Age).ThenBy(x => x.Size))
        {
            Console.WriteLine($"Age: {item.Age}     Size: {item.Size}");
        }

        Console.ReadLine();        
    }
}

public class TestItem
{
    public int Age { get; set; }
    public int Size { get; set; }
}

Result

AsParallel() does what we want. It first processes the OrderBy() parallel, merges back the list and then moves on to the next query, in our case ThenBy(). I tested this multiple times and always the same result.

Upvotes: 2

Related Questions