Reputation: 791

Using Lists and Linq, writing efficient code

I have created a quick console application, which creates 10000 younger people and 10000 older people and adds them to two separate lists. I then perform some queries to obtain information based on personalities.

class Program
{
    static void Main(string[] args)
    {
        private Random random = new Random();
        private List<Person> youngerPersons = new List<Person>();
        private List<Person> olderPersons = new List<Person>();

        private long samePersonalityMatches = 0;

        for (int i = 0; i < 10000; i++)
        {
            youngerPersons.Add(new Person(RandomString(10), DateTime.Now.ToString(), RandomString(4), random.Next(10, 50),(Person.PersonalityType)random.Next(0, 4), i));
        }

        for (int i = 0; i < 10000; i++)
        {
            olderPersons.Add(new Person(RandomString(10), DateTime.Now.ToString(), RandomString(4), random.Next(51, 120),(Person.PersonalityType)random.Next(0, 4), i));
        }

        //Example query 1
        foreach (Person person in youngerPersons.Where(w => w.Id > 4567 && w.Age > 70))
        {
            Console.WriteLine(person.Id);
        }

        //Example query  2
        foreach (Person person in youngerPersons)
        {
            foreach (Person olderPerson in olderPersons)
            {
                if (person.Personality == olderPerson.Personality)
                {
                    samePersonalityMatches++;
                }
            }
        }

        Console.WriteLine("Number of matches: " + samePersonalityMatches);
    }

    private static Random random = new Random((int)DateTime.Now.Ticks);

    private static string RandomString(int size)
    {
        StringBuilder builder = new StringBuilder();
        char ch;
        for (int i = 0; i < size; i++)
        {
            ch = Convert.ToChar(Convert.ToInt32(Math.Floor(26 * random.NextDouble() + 65)));
            builder.Append(ch);
        }

        return builder.ToString();
    }
}

internal class Person
{
    public enum PersonalityType
    {
        Funny = 0,
        Boring = 1, 
        Serious = 2,
        Grumpy = 3, 
        Normal = 4
    }

    public Person(string name, string dateofbirth, string nickname, int age, PersonalityType personalityType, int id)
    {
        this.Name = name;
        this.Dateofbirth = dateofbirth;
        this.Nickname = nickname;
        this.Age = age;
        this.Id = id;
        this.Personality = personalityType;
    }

    public string Name { get; set; }

    public string Dateofbirth { get; set; }

    public string Nickname { get; set; }

    public int Age { get; set; }

    public int Id { get; set; }

    public PersonalityType Personality { get; set; }
}

Basically I would like to understand best practices to get the most performance out of both examples queries I put in the code. I have read some performance related material on using intersect, but i'm not sure which and when is best to use to get most performance. The lists are a bit OTT(size wise), but it made example two more interesting to run.

Upvotes: 0

Answers (3)

Thomas Levesque

Reputation: 292725

The first example query is fine, there's probably nothing you can do to improve its performance.

In the second query, you could use a join:

samePersonalityMatches =
    youngerPersons.Join(olderPersons,
                        p => p.Personality,
                        p => p.Personality,
                        (left, right) => null) // the result doesn't really matter,
                                               // since we just want the count
                  .Count();

Or if you prefer the query syntax:

samePersonalityMatches =
    (from y in youngerPersons
     join o in olderPersons
         on y.Personality equals o.Personality
     select null).Count();

A join enumerates each list only once, so the complexity is O(M + N), instead of O(M * N)

Upvotes: 3

James World

Reputation: 29806

The specific answers here from Jason and Thomas assume (quite reasonably given the way your question was phrased) that you don't know the questions before you have the data.

Just thought I'd point out, that if you do know the questions you are going to ask your data, you can keep running aggregates as you add young and old people to your lists so the answer is always ready.

This is a similar idea to the motivation for opting to build particular indexes in a database.

Upvotes: 0

jason

Reputation: 241789

One is fine, very close to optimal and I'd leave it like you have it (remember, programmer time is more expensive than machine time).

For two you can do a lot better. You're walking the olderPersons list too many times, let's see if we can get it down to one traversal.

Dictionary<Personality, int> dict =
    youngerPersons.GroupBy(p => p.Personality)
                  .ToDictionary(g => g.Key, g => g.Count());
long samePersonalityMatches =
    olderPersons.Select(
                     q => dict.ContainsKey(q.Personality) ?
                     dict[q.Personality] : 0
                )
                .Sum();

And then, once we see this, we should say to ourselves, hey, this looks like a hash join! Then, we can write it more clearly:

long samePersonalityMatches = 
    youngerPersons.Join(
                      olderPersons,
                      p => p.Personality,
                      q => q.Personality,
                      (p, q) => null
                  )
                  .Count();

Any time you see the pattern nested loop, match over outer, inner you should be thinking of a join!

Upvotes: 3

Using Lists and Linq, writing efficient code

Answers (3)

Related Questions