Reputation: 36930
I've read up on several questions for the error "Explicit construction of entity type in query is not allowed", and the various ways to get around it.
I am using DBML automatically-generated LINQ to SQL classes in my code, so it would be great to be able to select and insert data appropriately. Here is one approach suggested in another post; in the below example, e_activeSession is an automatically generated representation of a table in the DataContext:
var statistics =
from record in startTimes
group record by record.startTime into g
select new e_activeSession
{
workerId = wcopy,
startTime = g.Key.GetValueOrDefault(),
totalTasks = g.Count(),
totalTime = g.Max(o => o.record.timeInSession).GetValueOrDefault(),
minDwell = g.Min(o => o.record.dwellTime).GetValueOrDefault(),
maxDwell = g.Max(o => o.record.dwellTime).GetValueOrDefault(),
avgDwell = g.Average(o => o.record.dwellTime).GetValueOrDefault(),
stdevDwell = g.Select(o => Convert.ToDouble(o.record.dwellTime)).StdDev(),
total80 = g.Sum(o => Convert.ToInt16(o.record.correct80) + Convert.ToInt16(o.record.wrong80)),
correct80 = g.Sum(o => Convert.ToInt16(o.record.correct80)),
percent80 = Convert.ToDouble(g.Sum(o => Convert.ToInt16(o.record.correct80))) /
g.Sum(o => Convert.ToInt16(o.record.correct80) + Convert.ToInt16(o.record.wrong80))
};
The above throws the error, so I tried the following:
var groups =
from record in startTimes
group record by record.startTime
into g
select g;
var statistics = groups.ToList().Select(
g => new e_activeSession
{
workerId = wcopy,
startTime = g.Key.GetValueOrDefault(),
totalTasks = g.Count(),
totalTime = g.Max(o => o.record.timeInSession).GetValueOrDefault(),
minDwell = g.Min(o => o.record.dwellTime).GetValueOrDefault(),
maxDwell = g.Max(o => o.record.dwellTime).GetValueOrDefault(),
avgDwell = g.Average(o => o.record.dwellTime).GetValueOrDefault(),
stdevDwell = g.Select(o => Convert.ToDouble(o.record.dwellTime)).StdDev(),
total80 = g.Sum(o => Convert.ToInt16(o.record.correct80) + Convert.ToInt16(o.record.wrong80)),
correct80 = g.Sum(o => Convert.ToInt16(o.record.correct80)),
percent80 = Convert.ToDouble(g.Sum(o => Convert.ToInt16(o.record.correct80))) /
g.Sum(o => Convert.ToInt16(o.record.correct80) + Convert.ToInt16(o.record.wrong80))
});
However, the ToList
seems to be incredibly inefficient and just makes my code sit there for a long time. Is there a better way to do this?
Upvotes: 4
Views: 2091
Reputation: 113342
AsEnumerable()
will do the same thing as ToList()
in terms of bringing the processing into linq-to-objects, but won't waste time and memory storing all of them first. Instead when you iterate through it, it'll create the objects one at a time.
As a rule, you should use AsEnumerable()
to move operations from another source into memory, rather than ToList()
unless you really want a list (e.g. if you'll hit the same data more than once, so the list acts a a cache).
So so far we have:
var statistics = (
from record in startTimes
group record by record.startTime
into g
select g;
).AsEnumerable().Select(
g => new e_activeSession
{
workerId = wcopy,
startTime = g.Key.GetValueOrDefault(),
totalTasks = g.Count(),
totalTime = g.Max(o => o.record.timeInSession).GetValueOrDefault(),
/* ... */
});
But there's a bigger problem. You want to be careful with group by
too. When done along with aggregate methods, it's normally okay, but otherwise it can end up being turned into many database calls (one to obtain the different values for the keys, and then one per each value).
Considering the above (with my elision to not mention every single column). Without using AsEnumerable()
(or ToList()
or what have you), since wcopy
is presumably outside of the query entirely (I can't see where it's defined), the SQL produced by the first would be (if it were allowed), something like:
select startTime, count(id), max(timeInSession), /* ... */
from tasks
group by startTime
Which should be handled pretty efficiently by the database (if it isn't, check indices and run the Database Engine Tuning Advisor on the query produced).
With grouping in memory though, it's likely going to first perform:
select distinct startTime from tasks
And then
select timeInSession, /* ... */
from tasks
where startTime = @p0
For each and every distinct startTime
found, passing it in as the @p0
. This can quickly become disastrous no matter how efficient the rest of the code.
We have two choices. Just which is best varies from case to case, so I'll give both, though the second is the most efficient here.
Sometimes our best approach is to load all relevant rows and do the grouping in memory:
var statistics =
from record in startTimes.AsEnumerable()
group record by record.startTime
into g
select new e_activeSession
{
workerId = wcopy,
startTime = g.Key.GetValueOrDefault(),
totalTasks = g.Count(),
totalTime = g.Max(o => o.record.timeInSession).GetValueOrDefault(),
/* ... */
};
We can maybe make it a bit more efficient still by only selecting columns we care about (irrelevant if the above uses every column in the table anyway)
var statistics =
from record in (
from dbRec in startTimes
select new {dbRec.startTime, dbRec.timeInSession, /*...*/}).AsEnumerable()
group record by record.startTime
into g
select new e_activeSession
{
workerId = wcopy,
startTime = g.Key.GetValueOrDefault(),
totalTasks = g.Count(),
totalTime = g.Max(o => o.record.timeInSession).GetValueOrDefault(),
/* ... */
};
I don't think this would be the best case here though. I would use this in cases where I was going to enumerate the groups, and then enumerate each group. In your case where you do an aggregate on every group and don't enumerate through them, it would be better to keep that aggregate work in the database. Databases are good at them, and it'll reduce the total amount of data sent over the wire considerably. The best I can think of in this case is to force a new object other than the entity type that mirrors it, but which is not recognised as an entity. You could create a type just for this (useful if you're doing several variants on this), otherwise just use an anonymous type:
var statistics = (
from record in startTimes
group record by record.startTime
into g
select new{
startTime = g.Key.GetValueOrDefault(),
totalTasks = g.Count(),
totalTime = g.Max(o => o.record.timeInSession).GetValueOrDefault(),
/* ... */
}).AsEnumerable().Select(
d => new e_activeSession
{
workerId = wcopy,
startTime = d.startTime,
totalTasks = d.totalTasks,
/* ... */
});
The obvious downside to this is the sheer verbosity. However, it will keep operations best done in the db, in the db, while still not wasting time and memory like ToList()
does, not hitting the db repeatedly, and dragging the e_activeSession
creation out of the linq2sql and into linq2objects, so it should be allowed.
(Incidentally, the convention in .NET is for class and member names to begin with capitals. There's no technical reason for that, but doing so means you'll match more people's code, including that of the BCL and other libraries you use).
Edit: Second incidentally; I've just seen your other question. Note that in a way, the AsEnumerable()
here is a variant on exactly what was causing the problem with that one. Grok that, and you'll have grokked a lot about the boundaries between different linq query providers.
Upvotes: 2