WonderHeart
WonderHeart

Reputation: 688

Entity Framework Linq foreach performance vs using Select() instead

I have following linq query in my C# code , which is causing System.OutOfmemory excpetion

public SortedSet<string> GetstudentCount()
{
    var studentCount= studentRepository
                  .GetBy(i => i.IsStudentEnabled && i.IsEnrolledAllSubjects) 
                  .AsQueryable()
                  .AsNoTracking()
                  .ToList();

    var studentSortedSet= new SortedSet<string>();
    foreach (var student in studentCount)
    {
        var id = string.Format("{0}:{1}", student.studentFirstName, student.totalScore);
        studentSortedSet.Add(id);
    }

     return new SortedSet<string>(studentCount);
}

So i am trying to optimize it and could see these options, but I am quite not sure since in my development database i don't have enough data to test. I am new to Entity Framework and to Linq, its a bit confusing to me to figure out which way is correct.

1) Removed ToList() in Linq query, but foreach() is taking same time as before(still slow)

2) Tried to remove entire foreach() and added Select() in the Linq query itself, like:

public SortedSet<string> GetStudentCount()
{
    var studentCount= studentRepository
                          .GetBy(i => i.IsStudentEnabled && 
                           i.IsEnrolledAllSubjects)
                          .Select(string.Format("{0}:{1}",                          
                           student.studentFirstName, student.totalScore)) 
                          .AsQueryable()
                          .AsNoTracking()
                          .ToList();

      return new SortedSet<string>(studentCount);
}

But even this takes same time(still slow)

I further thought of removing ToList() here, but this method is used at many places( can confirm that no looping is done on studentCount) , which i am quite not sure if this may cause more issues.

Any suggestion/advice on this is more appreciable.

EDIT2:

public IEnumerable<T> GetBy(Expression<Func<T, bool>> predicate)
{
    return dbSet.Where(predicate);
}

EDIT:

It may be a basic question to many, I request them not to down vote this Question, as here I am trying to get some pointers on how to optimize. There are no hard feelings here. I am trying to learn the stuff and I would appreciate if this makes sense.Thank you

Upvotes: 0

Views: 2268

Answers (2)

johnny 5
johnny 5

Reputation: 21031

There are a few things you can do to optimize. But lets go over some issues in general first.

Don't use the repository anti-pattern, you're just writing a wrapper for something entityframework does.

Don't pull everything into memory, if you're getting an out of memory exception, (assuming you've done nothing wrong in your code else where), you're pulling too much data into memory, If you need to pull that much data. create a paging api.

Select only the data you need from the DB. As you've already discovered, pulling back the whole data set when you just need the Firstname and Total Score is a waste.

Pick Better data structure. There is no reason to use a sorted set of string. Also Its quite doubt full that you will get the results you want because people with single digit low scores will be sorted higher, because you're using an alpha sort e.g

Andy:93
Andy:8
Andy:79    

Your sort should probably be done on the SQL end When all of thats done you should have something that looks like this (Minus the paging):

 public class StudentScores
 {
     public string Name { get; set;}
     public string TotalScore {get; set; }
 }

 var results = dbContext.Students.AsNoTracking().Where(s => s.IsStudentEnabled 
                                             && s.IsEnrolledAllSubjects)
                .OrderBy(x => x.studentFirstName).ThenBy(x => x.totalScore)
                .Select(x => new StudentScores { 
                                       Name = x.studentFirstName,
                                       TotalScore = x.totalScore
                }).ToList();

There are other micro-optimizations that you can make, e.g Compiled queries, but Id just start with this for now.

(PS. My main guess is the error is because of something you're not showing us, It seems like you're not showing us the full code because Where is the student variable coming from in this line .Select(string.Format("{0}:{1}", student.studentFirstName, student.totalScore))

Upvotes: 2

Dongdong
Dongdong

Reputation: 2508

  1. avoid to use linq in big query, but use stored procedure or functions instead.
  2. use pages to split query: YourFunction(...,int pageSize = 50, int page = 0), then, Get(...).Skip(pageSize *page).Take(pageSize). no page will show 100+ results
  3. if it's web app, split data and page, data will be ajax loaded
  4. use CompiledQuery.Compile: https://www.codeproject.com/Articles/38174/How-to-improve-your-LINQ-query-performance-by-X
  5. use Parallel query and yield return
  6. try to use RoslynLinqRewrite or LinqOptimiser mentioned by this post: http://mattwarren.org/2016/09/29/Optimising-LINQ/
  7. More performance details:

Upvotes: 2

Related Questions