Reputation: 7526
For example, I have table with 10000000 records and list in memory with 100000 ids.
How I can select all records by these ids? Obviously, I can do it like this:
var ids = GetIds();
var result = from q in ctx.Records
where ids.Contains(q.Id)
select q;
But if I do it like this, it will have two problems:
Monstrosity in generated SQL each time I use this query.
O(N) complexety for each row. So, I will receive my result in couple of years.
How I can do this? How is this done with linq2sql
or is this impossible to bypass?
Update
Yes, I tested it, I debugged it and HAVE seen what is generated. It is something like this:
SELECT * FROM Records WHERE id IN (/*one hundreds of thousand goes here O_O*/)
Simple math: 10.000.000 x 100.000 will give 1.000.000.000.000 in time complexity. This is extremely unefficient.
Upvotes: 1
Views: 360
Reputation: 37299
As you said doing these operations in memory isn't good. The solution is to have another table in your database and then your code will be:
var ids = GetIds();
//Insert all ids into database (bulk insert)
var result = from q in ctx.Records
join i in ctx.Ids on q.Id equals i.Id
select q;
This way the operation will take place in the database and you will overcome the restriction of the amount of items in a in
clause.
id
field of Records
should have an index (or if it is a primary key it already is also indexed)Ids
table should also be indexed This way the join will be a Hash join
and not a Nested Join
- equivalent for a .Net solution of using the ids
collection as a HashSet<string>
instead of a List<string>
.
Last, to support many operations in parallel you can have your Ids
with 2 columns: Id
, OperationId
. The OperationId
will be given to all the Id
s of a specific insert and then your query would look like:
var result = from q in ctx.Records
join i in ctx.Ids on q.Id equals i.Id
where i.OperationId = _the operationId given by the insert_
select q;
_Make sure that the OperationId
is indexed too - can be the same index of the Id
but see that it is the first in order.
Depending on the DBMS you can also have your table as a temporary table
- where at least for Oracle it stored the data per connected session - so you can insert from multiple sessions at once but the data isn't shared - so you don't need the OperationId
Upvotes: 1