Rocky Singh
Rocky Singh

Reputation: 15440

Linq performance for in-memory collection

I have a list: Collection users which has around 100K+ records of users (all user objects fully loaded from the database with fields like Bio, First name, last name etc). This collection is fetched on application start from the database and is kept in memory.

Then I have code like:

User cachedUser = users.FirstOrDefault(x => string.Equals(x.UserName, username,
StringComparison.CurrentCultureIgnoreCase));

Which I use to fetch users from this collection. But somehow I noticed that this operation is incredibly slow. Is there a performance issue while using Linq to query in memory collection of large objects? Should I instead call the DB each time I want to get a user?

Upvotes: 8

Views: 9339

Answers (4)

slfan
slfan

Reputation: 9139

Your LINQ query like any other iteration technique (loop, search in array) will access every single record until the requested record is found. In the worst case that means 100k comparisons. To make this faster, you have the following options:

  1. use a sorted list or a dictionary: a binary search is a lot faster. Sort the data when fetching it from the database by using ORDER BY
  2. use a DataSet. It's like an In-Memory database which provides faster search
  3. Leave the data in the database and set appropriate indexes for faster access

I suggest to use the database due to the following reasons:

  • It's a waste of memory to store 100k records, which you probably never use
  • As soon as you change your data, you will have to refresh your cache, which might be rather complex
  • web applications are multithreaded (every request runs in its own thread). In case you change your data, you will have to synchronize with locks.
  • a database can cache frequently called data
  • you have to write less code
  • you have a stateless web application which scales better (web farms)
  • your application probably has other data, you cannot store everything in memory

Upvotes: 3

Aristos
Aristos

Reputation: 66649

The different in the search performance that you notice is because the database is use indexing to locate the string in the database, but you in the memory you simple search all records until you find the one. Also the database keep a hash number for the string and search for this number hash that is a lot faster, and not make actually string compare.

The Dictionary<> make also an indexing, but have a delay to add data, when the data start grow because when its add some data, every time is search where to place it in the correct index point.

Also the database cache the results, many database cache also the indexing and create extra statistics that help to locate fast what you looking for.

Is better to let the database make the search, except if you can make something faster for extra custom cases.

Upvotes: 0

Jason De Oliveira
Jason De Oliveira

Reputation: 1752

If you want to optimize your response time and you could create a Dictionary<T,U> and search the user within:

    Dictionary<string, User> usersDictionary = new <Dictionary<string, User>(StringComparer.CurrentCultureIgnoreCase);

    // After querying the users from the DB add them to the dictionary             
    usersDictionary.Add(user.UserName, user);

    // Then when you need to retrieve a user
    User retrieveUser = null; 
    usersDictionary.TryGetValue(username, out retrieveUser);

Hope that helps !

Upvotes: 3

Bryan Crosby
Bryan Crosby

Reputation: 6554

I think you might need to re-think your architecture based on the information you have given us. Take advantage of the database and let it do the search work for you. Observe, measure, and make changes accordingly after that. You might realize that you prematurely optimized the whole thing.

Upvotes: 8

Related Questions