Alonzzo2
Alonzzo2

Reputation: 1009

Entity Framework 6 - Group by then Order by the First() takes too long

I really need help with this one and couldn't find related answers after hours of searching.

MySQL, Entity Framework 6, database with a few millions of records, record looks like:

Indexint(11) NOT NULL  
TaskIDint(11) NOT NULL  
DeviceIDbigint(20) NOT NULL  
Commentslongtext NULL  
ExtendedResultslongtext NULL  
RunResultint(11) NOT NULL  
JobResultint(11) NOT NULL  
JobResultValuedouble NOT NULL  
ReporterIDbigint(20) NOT NULL  
FieldIDbigint(20) NOT NULL  
TimeOfRundatetime NOT NULL  

What I need is to get all the records for a specific taskID, then group by DeviceID and sort by TimeOfRun in order to get the latest data for each deviceID in a specific taskID.

This is my code:

List<JobsRecordHistory> newH = db.JobsRecordHistories.AsNoTracking().Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID).
                Select(x => x.OrderByDescending(y => y.TimeOfRun).FirstOrDefault()).ToList();

But this is the generated query:

{SELECT
`Apply1`.`Index`, 
`Apply1`.`TaskID`, 
`Apply1`.`DEVICEID1` AS `DeviceID`, 
`Apply1`.`RunResult`, 
`Apply1`.`JobResult`, 
`Apply1`.`JobResultValue`, 
`Apply1`.`ExtendedResults`, 
`Apply1`.`Comments`, 
`Apply1`.`ReporterID`, 
`Apply1`.`FieldID`, 
`Apply1`.`TimeOfRun`
FROM (SELECT
`Project2`.`p__linq__0`, 
`Project2`.`DeviceID`, 
(SELECT
`Project3`.`Index`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `Index`, 
(SELECT
`Project3`.`TaskID`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `TaskID`, 
(SELECT
`Project3`.`DeviceID`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `DEVICEID1`, 
(SELECT
`Project3`.`RunResult`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `RunResult`, 
(SELECT
`Project3`.`JobResult`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `JobResult`, 
(SELECT
`Project3`.`JobResultValue`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `JobResultValue`, 
(SELECT
`Project3`.`ExtendedResults`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `ExtendedResults`, 
(SELECT
`Project3`.`Comments`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `Comments`, 
(SELECT
`Project3`.`ReporterID`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `ReporterID`, 
(SELECT
`Project3`.`FieldID`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `FieldID`, 
(SELECT
`Project3`.`TimeOfRun`
FROM `JobsRecordHistories` AS `Project3`
 WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
 ORDER BY 
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `TimeOfRun`
FROM (SELECT
@p__linq__0 AS `p__linq__0`, 
`Distinct1`.`DeviceID`
FROM (SELECT DISTINCT 
`Extent1`.`DeviceID`
FROM `JobsRecordHistories` AS `Extent1`
 WHERE `Extent1`.`TaskID` = @p__linq__0) AS `Distinct1`) AS `Project2`) AS `Apply1`}

Which takes way too long.
I don't know SQL good enough, I admit, but if insert a ToList() after the WHERE statement, then I get the results much quicker, though it's still not the right thing to do since there's a lot of un-needed data that the database passes to my app in this situation, and it's still slow = 30 seconds for 40k records.

I also tried this:

Dictionary<long, DateTime> DeviceIDAndTime = db.JobsRecordHistories.AsNoTracking().Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID)
                .Select(g => new DeviceIDaAndTime { deviceID = g.Key, timeOfRun = g.Max(gi => gi.TimeOfRun) }).ToDictionary(x => x.deviceID, x => x.timeOfRun);

In order to use the dictionary this way:

                List<JobsRecordHistory> newH = db.JobsRecordHistories.AsNoTracking().Where(x => DeviceIDAndTime.Keys.Contains(x.DeviceID) && x.TimeOfRun == DeviceIDAndTime[x.DeviceID]).ToList();

But I get this error:

Additional information: LINQ to Entities does not recognize the method 'System.DateTime get_Item(Int64)' method, and this method cannot be translated into a store expression.

Which makes sense cause from what I understand, when comparing the timeOfRun to the dictionary value, LINQ needs a specific value and not a collection when composing the query.

It's weird to me that I didn't find any related post and that other people didn't encounter this problem. I guess I missed something.

Appreciate any help, Thanks

Upvotes: 4

Views: 2641

Answers (2)

Alonzzo2
Alonzzo2

Reputation: 1009

Finally figured it out and improved performance.
I needed a query and a sub query, And I needed the MAX function instead of ORDER because I don't care about the order of the results, I only care about the biggest (timeOfRun) one.
Additionally, things were simplified once I noticed that bigger Index column (my PK, auto increment) means more recent data, so I didn't need MAX(timeOfRun), instead I used MAX(Index), though I'm quite sure it would have worked the same way.

This is my LINQ:

var historyQuery = db.JobsRecordHistories.AsNoTracking().Where(y => y.TaskID == taskID &&
                                    db.JobsRecordHistories.Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID).Select(g => g.Max(i => i.Index)).Contains<int>(y.Index));

And this is the generated SQL:

{SELECT
`Extent1`.`Index`, 
`Extent1`.`TaskID`, 
`Extent1`.`DeviceID`, 
`Extent1`.`RunResult`, 
`Extent1`.`JobResult`, 
`Extent1`.`JobResultValue`, 
`Extent1`.`ExtendedResults`, 
`Extent1`.`Comments`, 
`Extent1`.`ReporterID`, 
`Extent1`.`FieldID`, 
`Extent1`.`TimeOfRun`
FROM `JobsRecordHistories` AS `Extent1`
 WHERE (`Extent1`.`TaskID` = @p__linq__0) AND (EXISTS(SELECT
1 AS `C1`
FROM (SELECT
`Extent2`.`DeviceID` AS `K1`, 
MAX(`Extent2`.`Index`) AS `A1`
FROM `JobsRecordHistories` AS `Extent2`
 WHERE `Extent2`.`TaskID` = @p__linq__1
 GROUP BY 
`Extent2`.`DeviceID`) AS `GroupBy1`
 WHERE `GroupBy1`.`A1` = `Extent1`.`Index`))}

I hope this will help somebody since It took me 1.5 days of googling, looking at SQL queries, LINQ, debugging and optimizing

Upvotes: 6

robaudas
robaudas

Reputation: 1638

Give query syntax instead of method based a shot.
I haven't tested this locally but you might see improved sql generation.
Or at least maybe this approach might lead you down the right path

using System;
using System.Data.Entity;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace EF.CodeFirst
{
    [TestClass]
    public class UnitTest1
    {
        [TestMethod]
        public void TestMethod1()
        {
            using (var db = new TestDbContext())
            {
                var taskId = 1;
                var query = from job in db.JobRecordHistories
                    where job.TaskId == taskId
                    orderby job.TimeOfRun descending
                    group job by job.DeviceId
                    into deviceGroup
                    select deviceGroup;

                foreach (var deviceGroup in query)
                {
                    foreach (var jobRecordHistory in deviceGroup)
                    {
                        Console.WriteLine("DeviceId '{0}', TaskId'{1}' Runtime'{2}'", jobRecordHistory.DeviceId,
                            jobRecordHistory.TaskId, jobRecordHistory);
                    }
                }
            }
        }
    }

    public class TestDbContext : DbContext
    {
        public DbSet<JobRecordHistory> JobRecordHistories { get; set; }
    }

    public class JobRecordHistory
    {
        public int Id { get; set; }
        public int TaskId { get; set; }
        public int DeviceId { get; set; }
        public DateTime TimeOfRun { get; set; }
    }
}

Upvotes: 1

Related Questions