Kemal Can Kara
Kemal Can Kara

Reputation: 416

filtering large DataTable in for loop

I have a DataTable with Row.Count=2.000.000 and two columns containing integer values.

So what i need is filtering the datatable in a loop, efficiently.

I'm doing it with;

for (int i= 0; i< HugeDataTable.Rows.Count; i++)
{
  tempIp= int.Parse(HugeDataTable.Rows[i]["col1"].ToString());

  var filteredUsers = tumu.Select("col1= " + tempIp.ToString()).Select(dr => dr.Field<int>("col2")).ToList(); 

HashSet<int> filtered = new HashSet<int>(filteredUsersByJob2);

  Boolean[] userVector2 = userVectorBase
      .Select(item => filtered.Contains(item))
      .ToArray();

  ...
}

What should I do to improve performance. I need every little trick. Datatable index, linq search are what i came up with google search. I d like hear your suggestions. Thank you.

Upvotes: 0

Views: 558

Answers (2)

Carra
Carra

Reputation: 17964

You're using a double for loop. If your tumu contains a lot of rows it will be very slow.

Fix: make a dictionary with all users before your for loop. In your for loop check the dictionary.

Something like this:

Dictionary<string, id> usersByCode;//Init + fill it in
for (int i= 0; i< HugeDataTable.Rows.Count; i++)
{
  tempIp= int.Parse(HugeDataTable.Rows[i]["col1"].ToString());
  if(usersByCode.Contains(tempId) 
  {
    //Do something
  }
}

Upvotes: 0

Fr&#233;d&#233;ric Wu
Fr&#233;d&#233;ric Wu

Reputation: 11

You may use Parallel.For

Parallel.For(0, table.Rows.Count, rowIndex => {
var row = table.Rows[rowIndex];
// put your per-row calculation here});

Please have a look at this post

Upvotes: 1

Related Questions