Gelion
Gelion

Reputation: 531

Replacing Include() calls to Select()

Im trying to eliminate the use of the Include() calls in this IQueryable definition:

return ctx.timeDomainDataPoints.AsNoTracking()
   .Include(dp => dp.timeData)
   .Include(dp => dp.RecordValues.Select(rv => rv.RecordKind).Select(rk => rk.RecordAlias).Select(fma => fma.RecordAliasGroup))
   .Include(dp => dp.RecordValues.Select(rv => rv.RecordKind).Select(rk => rk.RecordAlias).Select(fma => fma.RecordAliasUnit))
   .Where(dp => dp.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))
   .Where(dp => dp.Source == 235235)
   .Where(dp => dp.timeData.time >= start && cd.timeData.time <= end)
   .OrderByDescending(cd => cd.timeData.time);

I have been having issues with the database where the run times are far too long and the primary cause of this is the Include() calls are pulling everything. This is evident in viewing the table that is returned from the resultant SQL query generated from this showing lots of unnecessary information being returned. One of the things that you learn I guess. The Database has a large collection of data points which there are many Recorded values. Each Recorded value is mapped to a Record Kind which may have a Record Alias.

I have tried creating a Select() as an alternative but I just cant figure out how to construct the right Select and also keep the entity hierarchy correctly loaded. I.e. the related entities are loaded with unnecessary calls to the DB.

Does anyone has alternate solutions that may jump start me to solve this problem.

Ill add more detail if needed.

Upvotes: 2

Views: 206

Answers (2)

Amin Sahranavard
Amin Sahranavard

Reputation: 308

your problem is hierarchy joins in your query.In order to decrease this problem create other query for get result from relation table as follows:

var items= ctx.timeDomainDataPoints.AsNoTracking().Include(dp =>dp.timeData).Include(dp => dp.RecordValues);
var ids=items.selectMany(item=>item.RecordValues).Select(i=>i.Id);

and on other request to db:

  var otherItems= ctx.RecordAlias.AsNoTracking().select(dp =>dp.RecordAlias).where(s=>ids.Contains(s.RecordKindId)).selectMany(s=>s.RecordAliasGroup)

to this approach your query do not have internal joins.

Upvotes: 0

Harald Coppoolse
Harald Coppoolse

Reputation: 30464

You are right. One of the slower parts of a database query is the transport of the selected data from the DBMS to your local process. Hence it is wise to limit this.

Every TimeDomainDataPoint has a primary key. All RecordValues of this TimeDomainDataPoint have a foreign key TimeDomainDataPointId with a value equal to this primary key.

So If TimeDomainDataPoint with Id 4 has a thousand RecordValues, then every RecordValue will have a foreign key with a value 4. It would be a waste to transfer this value 4 a 1001 times, while you only need it once.

When querying data, always use Select and select only the properties you actually plan to use. Only use Include if you plan to update the fetched included items.

The following will be much faster:

var result = dbContext.timeDomainDataPoints
    // first limit the datapoints you want to select
    .Where(datapoint => d.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))
    .Where(datapoint => datapoint.Source == 235235)
    .Where(datapoint => datapoint.timeData.time >= start
                     && datapoint.timeData.time <= end)
    .OrderByDescending(datapoint => datapoint.timeData.time)

    // then select only the properties you actually plan to use
    Select(dataPoint => new
    {
        Id = dataPoint.Id,
        RecordValues = dataPoint.RecordValues
            .Where(recordValues => ...)           // if you don't want all RecordValues
            .Select(recordValue => new
            {
                // again: select only the properties you actually plan to use:
                Id = recordValue.Id,
                // not needed, you know the value: DataPointId = recordValue.DataPointId,
                RecordKinds = recordValues.RecordKinds
                    .Where(recordKind => ...) // if you don't want all recordKinds
                    .Select(recordKind => new
                    {
                         ... // only the properties you really need!
                    })
                    .ToList(),
                 ...
            })
            .ToList(),

        TimeData = dataPoint.TimeData.Select(...),
        ...
    });

Possible imporvement

The part:

.Where(datapoint => d.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))

is used to fetch only datapoints that have recordValues with a non-null RecordAlias. If you are selecting the RecordAlias anyway, consider doing this Where after your select:

.Select(...)
.Where(dataPoint => dataPoint
       .Where(dataPoint.RecordValues.RecordKind.RecordAlias != null)
       .Any());

I'm not really sure whether this is faster. If your database management system internally first creates a complete table with all columns of all joined tables and then throws away the columns that are not selected, then it won't make a difference. However, if it only creates a table with the columns it actually uses, then the internal table will be smaller. This could be faster.

Upvotes: 2

Related Questions