juFo
juFo

Reputation: 18577

LINQ query optimization for slow grouping

I have a LINQ query that gets data via Entity Framework Code First from an SQL database. This works, but it works very very slow.

This is the original query:

      var tmpResult = from mdv in allMetaDataValues
                  where mdv.Metadata.InputType == MetadataInputType.String && mdv.Metadata.ShowInFilter && !mdv.Metadata.IsHidden && !string.IsNullOrEmpty(mdv.ValueString)
                  group mdv by new
                  {
                    mdv.ValueString,
                    mdv.Metadata
                  } into g
                  let first = g.FirstOrDefault()
                  select new
                  {
                    MetadataTitle = g.Key.Metadata.Title,
                    MetadataID = g.Key.Metadata.ID,
                    CollectionColor = g.Key.Metadata.Collection.Color,
                    CollectionID = g.Key.Metadata.Collection.ID,

                    MetadataValueCount = 0,
                    MetadataValueTitle = g.Key.ValueString,
                    MetadataValueID = first.ID
                  };

This is the generated SQL from the original query:

{SELECT 
0 AS [C1], 
[Project4].[Title] AS [Title], 
[Project4].[ID] AS [ID], 
[Extent9].[Color] AS [Color], 
[Project4].[Collection_ID] AS [Collection_ID], 
[Project4].[ValueString] AS [ValueString], 
[Project4].[C1] AS [C2]
FROM   (SELECT 
    [Project2].[ValueString] AS [ValueString], 
    [Project2].[ID] AS [ID], 
    [Project2].[Title] AS [Title], 
    [Project2].[Collection_ID] AS [Collection_ID], 
    (SELECT TOP (1) 
        [Filter4].[ID1] AS [ID]
        FROM ( SELECT [Extent6].[ID] AS [ID1], [Extent6].[ValueString] AS [ValueString], [Extent7].[Collection_ID] AS [Collection_ID1], [Extent8].[ID] AS [ID2], [Extent8].[InputType] AS [InputType], [Extent8].[ShowInFilter] AS [ShowInFilter], [Extent8].[IsHidden] AS [IsHidden1]
            FROM   [dbo].[MetadataValue] AS [Extent6]
            LEFT OUTER JOIN [dbo].[Media] AS [Extent7] ON [Extent6].[Media_ID] = [Extent7].[ID]
            INNER JOIN [dbo].[Metadata] AS [Extent8] ON [Extent6].[Metadata_ID] = [Extent8].[ID]
            WHERE ( NOT (([Extent6].[ValueString] IS NULL) OR (( CAST(LEN([Extent6].[ValueString]) AS int)) = 0))) AND ([Extent7].[IsHidden] <> cast(1 as bit))
        )  AS [Filter4]
        WHERE (2 =  CAST( [Filter4].[InputType] AS int)) AND ([Filter4].[ShowInFilter] = 1) AND ([Filter4].[IsHidden1] <> cast(1 as bit)) AND ([Filter4].[Collection_ID1] = @p__linq__0) AND (([Project2].[ValueString] = [Filter4].[ValueString]) OR (([Project2].[ValueString] IS NULL) AND ([Filter4].[ValueString] IS NULL))) AND (([Project2].[ID] = [Filter4].[ID2]) OR (1 = 0))) AS [C1]
    FROM ( SELECT 
        [Distinct1].[ValueString] AS [ValueString], 
        [Distinct1].[ID] AS [ID], 
        [Distinct1].[Title] AS [Title], 
        [Distinct1].[Collection_ID] AS [Collection_ID]
        FROM ( SELECT DISTINCT 
            [Filter2].[ValueString] AS [ValueString], 
            [Filter2].[ID3] AS [ID], 
            [Filter2].[InputType1] AS [InputType], 
            [Filter2].[Title1] AS [Title], 
            [Filter2].[ShowInFilter1] AS [ShowInFilter], 
            [Filter2].[IsHidden2] AS [IsHidden], 
            [Filter2].[Collection_ID2] AS [Collection_ID]
            FROM ( SELECT [Filter1].[ValueString], [Filter1].[Collection_ID3], [Filter1].[IsHidden3], [Filter1].[ID3], [Filter1].[InputType1], [Filter1].[Title1], [Filter1].[ShowInFilter1], [Filter1].[IsHidden2], [Filter1].[Collection_ID2]
                FROM ( SELECT [Extent1].[ValueString] AS [ValueString], [Extent2].[Collection_ID] AS [Collection_ID3], [Extent4].[IsHidden] AS [IsHidden3], [Extent5].[ID] AS [ID3], [Extent5].[InputType] AS [InputType1], [Extent5].[Title] AS [Title1], [Extent5].[ShowInFilter] AS [ShowInFilter1], [Extent5].[IsHidden] AS [IsHidden2], [Extent5].[Collection_ID] AS [Collection_ID2]
                    FROM     [dbo].[MetadataValue] AS [Extent1]
                    LEFT OUTER JOIN [dbo].[Media] AS [Extent2] ON [Extent1].[Media_ID] = [Extent2].[ID]
                    INNER JOIN [dbo].[Metadata] AS [Extent3] ON [Extent1].[Metadata_ID] = [Extent3].[ID]
                    LEFT OUTER JOIN [dbo].[Metadata] AS [Extent4] ON [Extent1].[Metadata_ID] = [Extent4].[ID]
                    LEFT OUTER JOIN [dbo].[Metadata] AS [Extent5] ON [Extent1].[Metadata_ID] = [Extent5].[ID]
                    WHERE ( NOT (([Extent1].[ValueString] IS NULL) OR (( CAST(LEN([Extent1].[ValueString]) AS int)) = 0))) AND ([Extent2].[IsHidden] <> cast(1 as bit)) AND (2 =  CAST( [Extent3].[InputType] AS int)) AND ([Extent3].[ShowInFilter] = 1)
                )  AS [Filter1]
                WHERE [Filter1].[IsHidden3] <> cast(1 as bit)
            )  AS [Filter2]
            WHERE [Filter2].[Collection_ID3] = @p__linq__0
        )  AS [Distinct1]
    )  AS [Project2] ) AS [Project4]
LEFT OUTER JOIN [dbo].[Collection] AS [Extent9] ON [Project4].[Collection_ID] = [Extent9].[ID]}

If we remove the "let first = g.FirstOrDefault()" and change "MetadataValueID = first.ID" to "MetadataValueID = 0" so that we just have a fixed ID = 0 for testing purposes, then the data loads very fast and the generated query itself is half the size compared to the original So it seems that this part is making the query very slow:

let first = g.FirstOrDefault()
...
  MetadataValueID = first.ID
};

How can this be rewritten? If I try to rewrite the code, it is still slow:

MetadataValueID = g.Select(x => x.ID).FirstOrDefault()

or

let first = g.Select(x => x.ID).FirstOrDefault()
...
  MetadataValueID = first
};

Any suggestions?

Upvotes: 3

Views: 2418

Answers (2)

Vojtěch Dohnal
Vojtěch Dohnal

Reputation: 8104

Using EF I have allways felt that it has problems efficiently translating stuff like g.Key.Metadata.Collection, so I try to join more explicitly and to include only fields, that are neccessary for your result. You can use include instead of join using repository pattern.

Then your query would look like this:

   from mdv in allMetaDataValues.Include("Metadata").Include("Metadata.Collection")
   where mdv.Metadata.InputType == MetadataInputType.String && 
         mdv.Metadata.ShowInFilter && 
         !mdv.Metadata.IsHidden && 
         !string.IsNullOrEmpty(mdv.ValueString)
   group mdv by new
   {
     MetadataID = mdv.Metadata.ID,
     CollectionID = mdv.Metadata.Collection.ID,
     mdv.Metadata.Title,
     mdv.Metadata.Collection.Color,
     mdv.ValueString
   } into g
   let first = g.FirstOrDefault().ID
   select new
   {
     MetadataTitle = g.Key.Title,
     MetadataID = g.Key.MetadataID,
     CollectionColor = g.Key.Color,
     CollectionID = g.Key.CollectionID,
     MetadataValueCount = 0,
     MetadataValueTitle = g.Key.ValueString,
     MetadataValueID = first
   }

Good tool for playing with linq is LinqPad.

The problem is also that:

  let first = g.FirstOrDefault().ID

cannot be easily translated to SQL see this answer. But this rewrite simplifies the underlying query for it at least. It remains to me unclear, why you need first ID from a set without using orderby.

It could be rewriten like this:

let first =  (from f in allMetaDataValues
              where f.Metadata.ID == g.Key.MetadataID && 
                    f.ValuesString == g.Key.ValuesString select f.ID)
             .FirstOrDefault()

This way you do not let EF write the query for you and you can specify exactly how to do the select. To speed up the query you can also consider adding indexes to database according to the generated query - namely index using both colums used in where clause of this let first query.

Upvotes: 1

Ishtiaq
Ishtiaq

Reputation: 1058

Try the following solution.
Replace FirstOrDefault() with .Take(1). FirstOrDefault() is not lazy loaded.

var tmpResult = from mdv in allMetaDataValues
                  where mdv.Metadata.InputType == MetadataInputType.String && mdv.Metadata.ShowInFilter && !mdv.Metadata.IsHidden && !string.IsNullOrEmpty(mdv.ValueString)
                  group mdv by new
                  {
                    mdv.ValueString,
                    mdv.Metadata
                  } into g
                  let first = g.Take(1)
                  select new
                  {
                    MetadataTitle = g.Key.Metadata.Title,
                    MetadataID = g.Key.Metadata.ID,
                    CollectionColor = g.Key.Metadata.Collection.Color,
                    CollectionID = g.Key.Metadata.Collection.ID,

                    MetadataValueCount = 0,
                    MetadataValueTitle = g.Key.ValueString,
                    MetadataValueID = first.ID
                  };

Upvotes: 0

Related Questions