Reputation: 416
I have a DataTable with Row.Count=2.000.000 and two columns containing integer values.
So what i need is filtering the datatable in a loop, efficiently.
I'm doing it with;
for (int i= 0; i< HugeDataTable.Rows.Count; i++)
{
tempIp= int.Parse(HugeDataTable.Rows[i]["col1"].ToString());
var filteredUsers = tumu.Select("col1= " + tempIp.ToString()).Select(dr => dr.Field<int>("col2")).ToList();
HashSet<int> filtered = new HashSet<int>(filteredUsersByJob2);
Boolean[] userVector2 = userVectorBase
.Select(item => filtered.Contains(item))
.ToArray();
...
}
What should I do to improve performance. I need every little trick. Datatable index, linq search are what i came up with google search. I d like hear your suggestions. Thank you.
Upvotes: 0
Views: 558
Reputation: 17964
You're using a double for loop. If your tumu contains a lot of rows it will be very slow.
Fix: make a dictionary with all users before your for loop. In your for loop check the dictionary.
Something like this:
Dictionary<string, id> usersByCode;//Init + fill it in
for (int i= 0; i< HugeDataTable.Rows.Count; i++)
{
tempIp= int.Parse(HugeDataTable.Rows[i]["col1"].ToString());
if(usersByCode.Contains(tempId)
{
//Do something
}
}
Upvotes: 0
Reputation: 11
You may use Parallel.For
Parallel.For(0, table.Rows.Count, rowIndex => {
var row = table.Rows[rowIndex];
// put your per-row calculation here});
Please have a look at this post
Upvotes: 1