eocron
eocron

Reputation: 7526

Linq2Sql select records by multiple ids?

For example, I have table with 10000000 records and list in memory with 100000 ids.

How I can select all records by these ids? Obviously, I can do it like this:

var ids = GetIds();
var result = from q in ctx.Records
             where ids.Contains(q.Id)
             select q;

But if I do it like this, it will have two problems:

How I can do this? How is this done with linq2sql or is this impossible to bypass?


Update

Upvotes: 1

Views: 360

Answers (1)

Gilad Green
Gilad Green

Reputation: 37299

As you said doing these operations in memory isn't good. The solution is to have another table in your database and then your code will be:

var ids = GetIds();
//Insert all ids into database (bulk insert)

var result = from q in ctx.Records
             join i in ctx.Ids on q.Id equals i.Id
             select q;

This way the operation will take place in the database and you will overcome the restriction of the amount of items in a in clause.

  1. The id field of Records should have an index (or if it is a primary key it already is also indexed)
  2. The id field of your new Ids table should also be indexed

This way the join will be a Hash join and not a Nested Join - equivalent for a .Net solution of using the ids collection as a HashSet<string> instead of a List<string>.

Last, to support many operations in parallel you can have your Ids with 2 columns: Id, OperationId. The OperationId will be given to all the Ids of a specific insert and then your query would look like:

var result = from q in ctx.Records
             join i in ctx.Ids on q.Id equals i.Id
             where i.OperationId = _the operationId given by the insert_
             select q;

_Make sure that the OperationId is indexed too - can be the same index of the Id but see that it is the first in order.


Depending on the DBMS you can also have your table as a temporary table - where at least for Oracle it stored the data per connected session - so you can insert from multiple sessions at once but the data isn't shared - so you don't need the OperationId

Upvotes: 1

Related Questions