Andy Pham
Andy Pham

Reputation: 47

Quickest way to search for objects in a very large list by string C#

for example i have a class like below :

public class MasterRecord
{
     public int Id { get; set; }
     public string UniqueId{ get; set; }
}

public class DetailRecord
{
     public int Id { get; set; }

     public int MasterRecordId { get; set; }

     public string UniqueId{ get; set; }
}

and i also 2 list which are:

MasterList and DetailList

MasterList will have around 300,000 records, DetailList will have around 7,000,000 records

What i need is loop for every record in the Master List and search the records which has same Name in DetailList.

Here are my code :

 foreach (var item in MasterList)
 {
    var matchPersons = DetailList.Where(q => q.UniqueId == item .UniqueId).ToList();

    if (matchPersons != null && matchPersons.Count() > 0)
    {
        foreach (var foundPerson in matchPersons)
        {
            //Do something with foundPerson
            foundPerson.MasterRecordId = item.Id;
        }
    }
 }

My code running very slow now , each search cost me 500 millisecond to finish , so with 300k records, it will take 2500 minutes :( to finish . Is there any other way to fast up this function ? Thanks and forgive for my poor English .

Updated code for make it more clearer of what i want to do.

Upvotes: 0

Views: 2388

Answers (3)

Harald Coppoolse
Harald Coppoolse

Reputation: 30464

If you need to handle "MasterRecords with their DetailRecords", don't use a normal join, use a GroupJoin. This will internally create something similar to a LookupTable.

The nice thing is that this will also work with databases, CSV-files, or whatever method that you use to get your records. You don't have to convert them into lists first.

// Your input sequences, if desired: use IQueryable
IEnumerable<MasterRecord> masterRecords = ...
IEnumerable<DetailRecord> detailRecords = ...
// Note: query not executed yet!

// GroupJoin these two sequences
var masterRecordsWithTheirDetailRecords = masterRecord.GroupJoin(detailRecords,
    masterRecord => masterRecord.Id,             // from masterRecord take the primary key
    detailRecord => detailRecord.MasterRecordId  // from detailRecord take the foreign key

    // ResultSelector: from every MasterRecord with its matching DetailRecords select
    (masterRecord, detailRecords) => new
    {
        // select the properties you plan to use:
        Id = masterRecord.Id,
        UniqueId = maserRecord.UniqueId,
        ...

        DetailRecords = detailRecords.Select(detailRecord => new
        {
            // again: select only the properties you plan to use
            Id = detailRecord.Id,
            ...

            // not needed, you know the value:
            // MasterRecordId = detailRecord.MasterRecordId,
        }),
        // Note: this is still an IEnumerable!            
     });

Usage:

foreach(var masterRecord in masterRecordsWithTheirDetailRecords)
{
    ... // process the master record with its detail records
}

The nice thing is, that is you have only need to process some of the MasterRecords (for instance, after the 1000th you decide that you found what you searched for), or if you have some MasterRecords of which you don't need all DetailRecords, no more records are processed than necessary. Linq will take care of that

Upvotes: 0

Anton&#237;n Lejsek
Anton&#237;n Lejsek

Reputation: 6103

Using some hash structure would be one of the best options:

var detailLookup = DetailList.ToLookup(q => q.Name);
foreach (var person in MasterList)
{
    foreach (var foundPerson in detailLookup[person.Name])
    {
        //Do something with foundPerson                
    }
}

Lookup returns empty sequence if the key is not present, so you do not have to test it.

Upvotes: 3

Anu Viswan
Anu Viswan

Reputation: 18155

You could use a Join on Name.

var result = masterList.Join(detailedList,m=>m.Name,d=>d.Name,(m,d)=>d);

Upvotes: 1

Related Questions