GreenEyedAndy
GreenEyedAndy

Reputation: 1525

Optimizing Linq

I have a file with the following content:

Aulin:            Performance Enhancers, Combat Stabilisers
i Bootis:         Fish, Basic Medicines
Aulin:            Agricultural Medicines, Combat Stabilisers
Eranin:           Tea, Coffee
LP 98-132:        Bertrandite,Gold
Dahan:            Tantalum, Explosives
Asellus Primus:   Resonant Separators, Non Lethal Weapons
LHS 3006:         Bertrandite, Indite

These are values from a famous game.

Now I read the data and converted it into a Dictionary with the following code:

var imports = File.ReadAllText(@"d:\exports.txt");
var productsDict = 
    imports
    .Split('\n')
    .Select(line => line.Split(':'))
    .GroupBy(line => line[0])
    .ToDictionary(
        line => line.Key, 
        line => 
           line.Select(item => item[1])
           .Aggregate((c, n) => c.Insert(c.Length, "," + n))
           .Split(',')
           .Select(i => i.Trim(' '))
           .Distinct()
    );

Can I optimize the LINQ and when must I use new identifiers for the chaining lambdas? As you see I mess around using line, item, i and so on.

Upvotes: 0

Views: 142

Answers (3)

Shlomi Borovitz
Shlomi Borovitz

Reputation: 1700

First, instead of loading the entire file into memory, you can read each line each time.

var lines = File.ReadLines(@"d:\exports.txt");

Then, you're missing many tools in the framework that could be used (like string.Join, or SelectMany), and simpler syntaxt:

var productDict = (from line in lines
                   let keyValue = line.Split(':')
                   let lineValues = from value in keyValue[1].Split(',')
                                    select value.Trim()
                   group lineValues by keyValue[0] into entry
                   select new
                   {
                       entry.Key,
                       Entry = (from values in entry
                                from value in values
                                select value).Distinct(),
                   })
                   .ToDictionary(
                       entry => entry.Key,
                       entry => string.Join(",", entry.Entry));

Upvotes: 0

user2160375
user2160375

Reputation:

The fastest way (and not quite hard to implement) is to manually parse file line-by-line and then char-by-char. (if it is going about parse performance) Steps:

  1. While it is not EOF (end of file)
  2. Read single line
  3. Read chars until first : appears.
  4. Remember just read word as key
  5. While not EOL (end of line): read chars until comma then put just word on list, back to 5)
  6. Put dictionary entry (where key is first read word and value is list of words got from step 5)
  7. Back to 1)

If in your case the performance is not the main key (the file is not quite large), you could group some LINQ methods into seperate methods (f.e extension methods) like:

public static IEnumerable<string> SplitByLine(this string text)
{
   return text.Split('\n');
}

public  static IEnumerable<string[]> KeyValuesSplitted(this IEnumerable<string> lines)
{
   return lines.Select(line => line.Split(':'));
}

public static IEnumerable<IGrouping<string[]>> GroupyByKey(this IEnumerable<string[]> keyValuesSplitted)
{
   return keyValuesSplitted.GroupBy(line => line.First());
}

// and so on..

Usage:

productsDict = imports.SplitByLine()
                      .KeyValuesSplitted()
                      .GroupyByKey() //and so on.

In this case each method is easy to understand and we know what is going on while importing.

Upvotes: 2

Jeroen1984
Jeroen1984

Reputation: 1686

You could introduce some local variables to make things more readable. Another trick to let things look nicer and more readable is to put every function call in your chain on a new line. That way you can read your code from up to down instead of scrolling to the right.

And you should really wrap this in a method called something like ConvertData() or ImportData() or whatever, maybe even in a seperate class. That way your code looks also more readeble since you don't have to know how the Import is done and you're not getting distracted by the complex code.

You can do something like this:

var imports = File.ReadAllText(@"d:\exports.txt");
var productsDict = importData(imports);

And then in a separate method, or maybe even better in a sepearate class (unless this class is only responsable for importing data):

private  Dictionary<string, IEnumerable<string>> importData(string imports)
    {
      return imports.Split('\n')
            .Select(line => line.Split(':'))
            .GroupBy(line => line[0])
            .ToDictionary(line => line.Key,
                          line => line.Select(item => item[1])
                                      .Aggregate((c, n) => c.Insert(c.Length, ","+n))
                                      .Split(',')
                                      .Select(i => i.Trim(' '))
                                      .Distinct()
                         );
    }

Even better, but maybe out of the questions scope, is to create an IImporter interface, with an import method defined in the interface, so later you can always switch your Importer class, or support multiple Importers, without breaking other parts of your code

Upvotes: 0

Related Questions