Eakan Gopalakrishnan
Eakan Gopalakrishnan

Reputation: 890

LINQ Multiple GroupBy Query Performing several times slower than T-SQL

I'm totally new to LINQ.

I have an SQL GroupBy which runs in barely a few milliseconds. But when I try to achieve the same thing via LINQ, it just seems awfully slow. What I'm trying to achieve is fetch an average monthly duration of a ceratin database update.

In SQL =>

select SUBSTRING(yyyyMMdd, 0,7), 
       AVG (duration) 
  from (select (CONVERT(CHAR(8), mmud.logDateTime, 112)) as yyyyMMdd, 
                DateDIFF(ms, min(mmud.logDateTime), max(mmud.logDateTime)) as duration
          from mydb.mydbo.updateData mmud
          left 
          join mydb.mydbo.updateDataKeyValue mmudkv
            on mmud.updateDataid = mmudkv.updateDataId
          left 
          join mydb.mydbo.updateDataDetailKey mmuddk
            on mmudkv.updateDataDetailKeyid = mmuddk.Id
         where dbname = 'MY_NEW_DB'
           and mmudkv.value in ('start', 'finish')             
         group
            by (CONVERT(CHAR(8), mmud.logDateTime, 112))
        ) as resultSet
  group 
     by substring(yyyyMMdd, 0,7)
  order
     by substring(yyyyMMdd, 0,7)

in LINQ => I first fetch the record from a table that links information of the Database Name and UpdateData and then do filtering and groupby on the related information.

    entry.updatedata.Where(
        ue => ue.updatedataKeyValue.Any(
                  uedkv =>
                  uedkv.Value.ToLower() == "starting update" ||
                  uedkv.Value.ToLower() == "client release"))
         .Select(
             ue =>
             new
                 {
                     logDateTimeyyyyMMdd = ue.logDateTime.Date,
                     logDateTime = ue.logDateTime
                 })
         .GroupBy(
             updateDataDetail => updateDataDetail.logDateTimeyyyyMMdd)
         .Select(
             groupedupdatedata => new
                 {
                     UpdateDateyyyyMM = groupedupdatedata.Key.ToString("yyyyMMdd"),
                     Duration =
                                        (groupedupdatedata.Max(groupMember => groupMember.logDateTime) -
                                         groupedupdatedata.Min(groupMember => groupMember.logDateTime)
                                        )
                                        .TotalMilliseconds
                 }
        ).
          ToList();
var updatedataMonthlyDurations =
    updatedataInDateRangeWithDescriptions.GroupBy(ue => ue.UpdateDateyyyyMM.Substring(0,6))
                                           .Select(
                                               group =>
                                               new updatedataMonthlyAverageDuration
                                                   {
                                                       DbName = entry.DbName,
                                                       UpdateDateyyyyMM = group.Key.Substring(0,6),
                                                       Duration =
                                                           group.Average(
                                                               gmember =>
                                                               (gmember.Duration))
                                                   }
        ).ToList();

I know that GroupBy in LINQ isn't the same as GroupBy in T-SQL, but not sure what happens behind the scenes. Could anyone explain the difference and what happens in memory when I run the LINQ version? After I did the .ToList() after the first GroupBy things got a little faster. But even then this way of finding average duration is really slow. What would be the best alternative and are there ways of improving a slow LINQ statement using Visual Studio 2012?

Upvotes: 1

Views: 1680

Answers (1)

Aducci
Aducci

Reputation: 26694

Your linq query is doing most of its work in linq-to-objects. You should be constructing a linq-to-entities/sql query that generates the complete query in one shot.

Your query seems to have a redundant group by clause, and I am not sure which table dbname comes from, but the following query should get you on the right track.

var query = from mmud in context.updateData 
            from mmudkv in context.updateDataKeyValue
                                  .Where(x => mmud.updateDataid == x.updateDataId)
                                  .DefaultIfEmpty()
            from mmuddk in context.updateDataDetailKey 
                                  .Where(x => mmudkv.updateDataDetailKeyid == x.Id)
                                  .DefaultIfEmpty()
            where mmud.dbname == "MY_NEW_DB"
            where mmudkv.value == "start" ||  mmudkv.value == "finish"
            group mmud by mmud.logDateTime.Date into g
            select new 
            {
               Date = g.Key,
               Average = EntityFunctions.DiffMilliseconds(g.Max(x => x.logDateTime), g.Min(x => x.logDateTime)),
            };

var queryByMonth = from x in query
                   group x by new { x.Date.Year, x.Date.Month } into x
                   select new
                   {
                     Year = x.Key.Year,
                     Month = x.Key.Month,
                     Average = x.Average(y => y.Average)
                   };

// Single sql statement is to sent to your database
var result = queryByMonth.ToList();

If you are still having problems, we will need to know if you are using entityframework or linq-to-sql. And you will need to provide your context/model information

Upvotes: 1

Related Questions